0%| | 0/2230 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:34,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:35,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:36,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:37,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:38,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:39,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:40,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:41,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:41,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:43,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:43,655 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:44,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:45,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:46,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:47,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:48,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:49,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:50,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:52,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:52,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:54,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:54,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:55,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:56,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:57,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:11:58,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:11:59,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:00,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:01,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:02,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 1/2230 [00:30<18:41:39, 30.19s/it] 0%| | 1/2230 [00:30<18:41:39, 30.19s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:12:03,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:04,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:05,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:05,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:06,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:07,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:08,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:09,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:10,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:11,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:12,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:12,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:13,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:14,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:15,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:16,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:17,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:18,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:19,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:19,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:21,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:21,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:22,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:23,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:24,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:25,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:26,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:27,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:28,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:28,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:29,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:30,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 2/2230 [00:58<18:01:35, 29.13s/it] 0%| | 2/2230 [00:58<18:01:35, 29.13s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:12:31,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:32,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:33,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:34,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:35,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:35,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:37,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:37,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:38,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:39,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:40,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:41,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:42,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:42,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:43,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:44,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:45,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:46,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:47,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:48,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:49,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:49,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:50,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:51,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:52,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:53,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:54,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:55,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:56,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:56,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 8.6797, 'learning_rate': 1.2e-06, 'epoch': 0.01} [WARNING|modeling_bart.py:1051] 2022-03-26 17:12:57,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:12:58,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 3/2230 [01:26<17:42:13, 28.62s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:12:59,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:00,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:01,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:02,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:03,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:03,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:04,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:05,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:06,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:07,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:08,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:08,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:10,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:10,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:11,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:12,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:13,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:14,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:15,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:15,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:16,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:17,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:18,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:19,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:20,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:20,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:22,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:22,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:23,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:24,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:25,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:26,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 4/2230 [01:54<17:25:37, 28.18s/it] 0%|▏ | 4/2230 [01:54<17:25:37, 28.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:13:27,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 4/2230 [01:54<17:25:37, 28.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:30,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:34,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:34,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:37,421 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:37,421 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:40,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:44,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:44,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:47,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:47,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:50,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:50,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▏ | 5/2230 [02:21<17:11:18, 27.81s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▏ | 5/2230 [02:21<17:11:18, 27.81s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:57,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:13:57,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:01,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:04,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:04,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:07,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:11,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:11,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:14,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:14,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:17,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▏ | 6/2230 [02:48<17:00:49, 27.54s/it] Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▏ | 6/2230 [02:48<17:00:49, 27.54s/it] Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▏ | 6/2230 [02:48<17:00:49, 27.54s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:24,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:24,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:29,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:29,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:32,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:32,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:35,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:39,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:39,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:42,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:42,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:45,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▏ | 7/2230 [03:16<17:05:31, 27.68s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▏ | 7/2230 [03:16<17:05:31, 27.68s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.2504, 'learning_rate': 3.6e-06, 'epoch': 0.03} [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:52,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:56,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:56,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:59,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:14:59,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:02,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:06,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:06,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:09,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:09,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:12,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:12,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▎ | 8/2230 [03:43<16:55:52, 27.43s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%|▎ | 8/2230 [03:43<16:55:52, 27.43s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:19,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:22,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:22,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:26,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:26,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:29,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.6117, 'learning_rate': 4.8e-06, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.1625, 'learning_rate': 5.399999999999999e-06, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.035, 'learning_rate': 5.999999999999999e-06, 'epoch': 0.05} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.8002, 'learning_rate': 6.599999999999999e-06, 'epoch': 0.05} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.5852, 'learning_rate': 7.2e-06, 'epoch': 0.06} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.4332, 'learning_rate': 7.799999999999998e-06, 'epoch': 0.06} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.2015, 'learning_rate': 8.4e-06, 'epoch': 0.07} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.037, 'learning_rate': 8.999999999999999e-06, 'epoch': 0.07} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.9392, 'learning_rate': 9.6e-06, 'epoch': 0.08} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.7954, 'learning_rate': 1.02e-05, 'epoch': 0.08} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6019, 'learning_rate': 1.0799999999999998e-05, 'epoch': 0.09} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6175, 'learning_rate': 1.14e-05, 'epoch': 0.09} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.481, 'learning_rate': 1.1999999999999999e-05, 'epoch': 0.09} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4746, 'learning_rate': 1.26e-05, 'epoch': 0.1} [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3499, 'learning_rate': 1.3199999999999997e-05, 'epoch': 0.11} 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2045, 'learning_rate': 1.3799999999999998e-05, 'epoch': 0.11} 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2217, 'learning_rate': 1.44e-05, 'epoch': 0.12} [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0872, 'learning_rate': 1.4999999999999999e-05, 'epoch': 0.12} [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1305, 'learning_rate': 1.5599999999999996e-05, 'epoch': 0.13} [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:23:48,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:23:48,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:23:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:23:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:23:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9678, 'learning_rate': 1.6199999999999997e-05, 'epoch': 0.13} 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:24:15,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:24:15,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:24:15,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0943, 'learning_rate': 1.68e-05, 'epoch': 0.13} 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0962, 'learning_rate': 1.74e-05, 'epoch': 0.14} 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:24:58,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:24:58,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:02,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:02,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9747, 'learning_rate': 1.7999999999999997e-05, 'epoch': 0.14} [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:23,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:23,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9798, 'learning_rate': 1.8599999999999998e-05, 'epoch': 0.15} [WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:49,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:49,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9415, 'learning_rate': 1.92e-05, 'epoch': 0.15} [WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:30,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:30,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9001, 'learning_rate': 2.04e-05, 'epoch': 0.16} [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:30,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:36,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:36,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:36,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:50,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:26:50,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.995, 'learning_rate': 2.1e-05, 'epoch': 0.17} [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:03,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:03,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:07,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:07,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 38/2230 [15:39<12:36:33, 20.71s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 38/2230 [15:39<12:36:33, 20.71s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:13,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:13,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:13,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:19,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:21,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:24,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:24,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:24,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:29,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:29,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9133, 'learning_rate': 2.2199999999999998e-05, 'epoch': 0.17} [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:34,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:36,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:36,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:36,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:41,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:44,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:46,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 40/2230 [16:15<11:51:56, 19.51s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 40/2230 [16:15<11:51:56, 19.51s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:50,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:52,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:54,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:27:54,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:27:58,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:00,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:02,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:04,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:04,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:06,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:08,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:10,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:12,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:14,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:16,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:18,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:18,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:20,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:22,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:23,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:25,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:27,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:31,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:32,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:34,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:34,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:36,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:38,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:39,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:43,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:44,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:47,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:48,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:48,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:50,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:53,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:55,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:56,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:28:58,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:02,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:05,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:06,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:09,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:10,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:10,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:13,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:15,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:17,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:19,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:21,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:21,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:23,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:25,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:27,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:29,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:29,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:31,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:33,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:36,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:36,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:36,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:39,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:41,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:43,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:43,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6317, 'learning_rate': 2.88e-05, 'epoch': 0.22} [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:47,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:47,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:51,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:51,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:54,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:58,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:29:58,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:01,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:01,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:05,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:05,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:09,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:09,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:09,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:12,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:16,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:16,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:19,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:19,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:26,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:26,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:30,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:30,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:33,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:40,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:40,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.0773, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.23} [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:44,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:47,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:47,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:50,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:50,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:54,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:57,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:30:57,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:01,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:01,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:04,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:08,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:08,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.8237, 'learning_rate': 3.06e-05, 'epoch': 0.24} [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:11,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:11,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:14,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:18,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:18,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:21,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:21,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:25,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:28,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:28,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:31,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:35,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:35,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4014, 'learning_rate': 3.119999999999999e-05, 'epoch': 0.24} [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0832, 'learning_rate': 3.1799999999999994e-05, 'epoch': 0.25} [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8432, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.25} 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8566, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.26} 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8108, 'learning_rate': 3.36e-05, 'epoch': 0.26} 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7935, 'learning_rate': 3.42e-05, 'epoch': 0.26} 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8346, 'learning_rate': 3.48e-05, 'epoch': 0.27} 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8708, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.27} 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7303, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.28} 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8369, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.28} 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7123, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.29} 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6835, 'learning_rate': 3.78e-05, 'epoch': 0.29} 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7015, 'learning_rate': 3.84e-05, 'epoch': 0.3} 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6597, 'learning_rate': 3.9e-05, 'epoch': 0.3} 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7459, 'learning_rate': 4.02e-05, 'epoch': 0.31} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6883, 'learning_rate': 4.08e-05, 'epoch': 0.31} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6348, 'learning_rate': 4.14e-05, 'epoch': 0.32} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6861, 'learning_rate': 4.2e-05, 'epoch': 0.32} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5766, 'learning_rate': 4.259999999999999e-05, 'epoch': 0.33} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6444, 'learning_rate': 4.319999999999999e-05, 'epoch': 0.33} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7433, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.34} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6555, 'learning_rate': 4.4399999999999995e-05, 'epoch': 0.34} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5683, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.35} 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6298, 'learning_rate': 4.56e-05, 'epoch': 0.35} [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5865, 'learning_rate': 4.62e-05, 'epoch': 0.35} [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6905, 'learning_rate': 4.68e-05, 'epoch': 0.36} [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5504, 'learning_rate': 4.7399999999999993e-05, 'epoch': 0.36} [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6262, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.37} [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:34,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:34,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5915, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.37} [WARNING|modeling_utils.py:388] 2022-03-26 17:43:38,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:38,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:38,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:55,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:55,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:43:55,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6035, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.38} [WARNING|modeling_bart.py:1051] 2022-03-26 17:44:01,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:44:01,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:44:01,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:23,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:23,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:23,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:29,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:29,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:29,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.493, 'learning_rate': 5.04e-05, 'epoch': 0.39} [WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:43,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:43,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:44:47,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:44:47,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 87/2230 [33:23<12:18:14, 20.67s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 87/2230 [33:23<12:18:14, 20.67s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:58,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:58,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:44:58,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:04,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:04,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:04,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:10,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:10,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:14,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:14,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5418, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.39} [WARNING|modeling_utils.py:388] 2022-03-26 17:45:14,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:20,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:22,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:22,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:26,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:29,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:29,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:32,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:32,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:32,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:37,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:39,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:39,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:39,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:44,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:46,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:49,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:51,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:51,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:53,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:55,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 17:45:55,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:45:59,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:01,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:03,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:05,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:07,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:09,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:09,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:11,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:13,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:15,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:17,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:19,241 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:21,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:23,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:23,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:25,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:26,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:28,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:30,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:34,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:35,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:37,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:39,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:39,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:41,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:42,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:44,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:47,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:47,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:51,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:55,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:58,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:46:59,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:01,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:03,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:03,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:05,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:08,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:09,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:12,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:13,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:16,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:16,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:18,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:19,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:21,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:24,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:26,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:26,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:28,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:30,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:32,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:36,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:38,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:40,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:40,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:42,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:44,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:45,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:45,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:45,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:48,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:52,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:52,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:56,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:56,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:47:59,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:03,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:03,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:06,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:06,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:10,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:10,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:13,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:13,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:17,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:17,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:20,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:20,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:24,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:24,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:27,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:31,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:31,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:34,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:38,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:38,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:41,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:41,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.8072, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.46} [WARNING|modeling_utils.py:388] 2022-03-26 17:48:45,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:45,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:48,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:51,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:51,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:55,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:55,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:48:58,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:02,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:02,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:05,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:05,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:09,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:09,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:15,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:19,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:19,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:22,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:22,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:25,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:29,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:29,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:32,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:32,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:35,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:35,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:39,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:39,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9539, 'learning_rate': 6.18e-05, 'epoch': 0.47} 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8815, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.48} 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6952, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.48} 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7131, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.48} 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6939, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.49} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6896, 'learning_rate': 6.479999999999999e-05, 'epoch': 0.49} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6205, 'learning_rate': 6.539999999999999e-05, 'epoch': 0.5} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6257, 'learning_rate': 6.599999999999999e-05, 'epoch': 0.5} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.63, 'learning_rate': 6.659999999999999e-05, 'epoch': 0.51} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5141, 'learning_rate': 6.72e-05, 'epoch': 0.51} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5986, 'learning_rate': 6.78e-05, 'epoch': 0.52} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.55, 'learning_rate': 6.84e-05, 'epoch': 0.52} 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5866, 'learning_rate': 6.9e-05, 'epoch': 0.52} 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5122, 'learning_rate': 6.96e-05, 'epoch': 0.53} 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5169, 'learning_rate': 7.02e-05, 'epoch': 0.53} 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5464, 'learning_rate': 7.079999999999999e-05, 'epoch': 0.54} 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [45:20<14:27:50, 24.69s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [45:20<14:27:50, 24.69s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4779, 'learning_rate': 7.139999999999999e-05, 'epoch': 0.54} [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.517, 'learning_rate': 7.199999999999999e-05, 'epoch': 0.55} [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4728, 'learning_rate': 7.259999999999999e-05, 'epoch': 0.55} [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4484, 'learning_rate': 7.319999999999999e-05, 'epoch': 0.56} [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6016, 'learning_rate': 7.379999999999999e-05, 'epoch': 0.56} [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4771, 'learning_rate': 7.439999999999999e-05, 'epoch': 0.57} [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5307, 'learning_rate': 7.5e-05, 'epoch': 0.57} [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5465, 'learning_rate': 7.56e-05, 'epoch': 0.57} [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5136, 'learning_rate': 7.62e-05, 'epoch': 0.58} 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5539, 'learning_rate': 7.68e-05, 'epoch': 0.58} 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:42,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:42,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:46,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:46,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.549, 'learning_rate': 7.74e-05, 'epoch': 0.59} [WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3814, 'learning_rate': 7.8e-05, 'epoch': 0.59} [WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 133/2230 [49:59<13:08:01, 22.55s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:33,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:33,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:48,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:01:48,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.492, 'learning_rate': 7.92e-05, 'epoch': 0.6} 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:10,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:10,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:10,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5401, 'learning_rate': 7.98e-05, 'epoch': 0.61} [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4804, 'learning_rate': 8.04e-05, 'epoch': 0.61} [WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:02:49,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:02:49,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:53,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:53,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5568, 'learning_rate': 8.1e-05, 'epoch': 0.61} [WARNING|modeling_utils.py:388] 2022-03-26 18:02:53,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:11,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:11,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:11,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4821, 'learning_rate': 8.16e-05, 'epoch': 0.62} [WARNING|modeling_utils.py:388] 2022-03-26 18:03:17,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:17,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:21,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:21,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:27,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:27,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:32,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:32,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:32,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:36,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:36,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:40,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:42,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:42,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:46,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:48,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:50,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:50,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:52,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:03:52,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:56,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:03:58,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:00,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:02,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:04,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:06,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:06,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:08,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:10,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:12,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:14,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:16,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:18,421 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:20,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:20,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:22,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:24,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:25,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:27,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:31,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:33,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:34,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:34,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:36,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:38,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:40,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:43,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:44,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:46,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:48,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:48,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:50,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:53,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:55,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:56,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:04:59,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:01,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:01,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:03,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:05,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:07,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:11,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:11,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:14,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:15,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:17,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:18,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:21,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:21,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:23,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:25,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:27,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:29,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:29,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:31,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:34,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:35,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:37,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:37,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:39,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:42,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:43,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:43,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4715, 'learning_rate': 8.879999999999999e-05, 'epoch': 0.67} [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:47,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:47,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:51,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:51,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:54,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:58,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:05:58,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:01,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:01,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:05,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:05,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:09,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:12,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:12,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.287, 'learning_rate': 8.939999999999999e-05, 'epoch': 0.68} [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:16,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:16,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:19,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:23,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:23,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:26,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:26,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:29,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:29,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:33,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:36,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:36,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:40,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:40,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.1091, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.68} [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:43,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:47,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:47,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:50,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:50,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:54,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:57,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:06:57,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:00,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:00,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:04,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:07,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:07,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:07,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:11,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:11,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:14,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:17,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:17,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:21,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:21,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:24,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:28,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:28,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:31,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:31,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:31,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:34,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:38,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:38,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:41,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:07:41,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8993, 'learning_rate': 9.18e-05, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8036, 'learning_rate': 9.24e-05, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7089, 'learning_rate': 9.3e-05, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.658, 'learning_rate': 9.36e-05, 'epoch': 0.71} 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5872, 'learning_rate': 9.419999999999999e-05, 'epoch': 0.71} 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6761, 'learning_rate': 9.479999999999999e-05, 'epoch': 0.72} 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5023, 'learning_rate': 9.539999999999999e-05, 'epoch': 0.72} 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5411, 'learning_rate': 9.599999999999999e-05, 'epoch': 0.73} 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5728, 'learning_rate': 9.659999999999999e-05, 'epoch': 0.73} 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5043, 'learning_rate': 9.719999999999999e-05, 'epoch': 0.74} 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5902, 'learning_rate': 9.779999999999999e-05, 'epoch': 0.74} [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4932, 'learning_rate': 9.839999999999999e-05, 'epoch': 0.74} 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5148, 'learning_rate': 9.9e-05, 'epoch': 0.75} 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5009, 'learning_rate': 9.96e-05, 'epoch': 0.75} 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3554, 'learning_rate': 0.0001002, 'epoch': 0.76} 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:30,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:30,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:30,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:51,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:51,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:51,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4837, 'learning_rate': 0.0001014, 'epoch': 0.77} [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3605, 'learning_rate': 0.000102, 'epoch': 0.77} [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.423, 'learning_rate': 0.0001026, 'epoch': 0.78} 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3804, 'learning_rate': 0.00010319999999999999, 'epoch': 0.78} 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4725, 'learning_rate': 0.00010379999999999999, 'epoch': 0.78} [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4212, 'learning_rate': 0.00010439999999999999, 'epoch': 0.79} 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4313, 'learning_rate': 0.00010499999999999999, 'epoch': 0.79} 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3636, 'learning_rate': 0.00010559999999999998, 'epoch': 0.8} 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4336, 'learning_rate': 0.00010619999999999998, 'epoch': 0.8} 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3958, 'learning_rate': 0.00010679999999999998, 'epoch': 0.81} 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.529, 'learning_rate': 0.00010739999999999998, 'epoch': 0.81} [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3276, 'learning_rate': 0.00010799999999999998, 'epoch': 0.82} [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:30,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:30,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5321, 'learning_rate': 0.00010859999999999998, 'epoch': 0.82} [WARNING|modeling_utils.py:388] 2022-03-26 18:19:34,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:34,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:38,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:38,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4487, 'learning_rate': 0.00010919999999999998, 'epoch': 0.83} [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 185/2230 [1:08:40<12:23:57, 21.83s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 186/2230 [1:09:00<12:10:20, 21.44s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 187/2230 [1:09:21<11:57:53, 21.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 187/2230 [1:09:21<11:57:53, 21.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.407, 'learning_rate': 0.00011099999999999999, 'epoch': 0.84} 8%|██████▎ | 187/2230 [1:09:21<11:57:53, 21.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:09,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:09,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 188/2230 [1:09:41<11:50:45, 20.88s/it]g-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 188/2230 [1:09:41<11:50:45, 20.88s/it]g-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:16,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:16,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:20,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:20,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:24,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:24,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:24,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:30,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:32,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:32,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3983, 'learning_rate': 0.00011219999999999999, 'epoch': 0.85} [WARNING|modeling_utils.py:388] 2022-03-26 18:21:32,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:38,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:40,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:40,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:45,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:51,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:21:51,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4501, 'learning_rate': 0.00011279999999999999, 'epoch': 0.85} [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:55,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:57,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:21:57,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:01,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:03,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:05,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:07,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:09,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:09,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:11,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:13,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:15,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:17,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:19,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:21,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:23,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:22:23,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3934, 'learning_rate': 0.00011399999999999999, 'epoch': 0.86} [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:27,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:29,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:31,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:33,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:35,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:36,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▌ | 193/2230 [1:11:07<9:46:56, 17.29s/it] Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▌ | 193/2230 [1:11:07<9:46:56, 17.29s/it] Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:42,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:44,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:45,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:47,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:49,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:51,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:53,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:53,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▌ | 194/2230 [1:11:22<9:19:45, 16.50s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:22:58,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:00,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:01,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:03,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:06,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:06,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 195/2230 [1:11:35<8:40:49, 15.36s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:09,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:12,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:13,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:16,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:17,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:17,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:20,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:21,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:24,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:26,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:26,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 197/2230 [1:11:56<7:15:59, 12.87s/it] Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:30,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:32,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:34,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:36,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:36,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:38,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:37,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:40,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:37,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:42,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:37,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 199/2230 [1:12:12<5:48:21, 10.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 199/2230 [1:12:12<5:48:21, 10.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:46,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:48,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:49,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:49,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 200/2230 [1:12:19<5:13:36, 9.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 200/2230 [1:12:19<5:13:36, 9.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:56,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:23:56,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:00,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:00,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:03,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:07,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:07,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:10,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:10,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:14,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:17,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:17,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:17,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 201/2230 [1:12:47<8:29:35, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 201/2230 [1:12:47<8:29:35, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:24,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:27,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:27,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:31,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:31,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:34,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:37,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:37,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:41,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:41,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:44,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:44,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 202/2230 [1:13:14<10:31:00, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 202/2230 [1:13:14<10:31:00, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:51,365 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:54,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:54,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:58,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:24:58,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:01,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:04,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:04,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:07,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:11,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:11,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 203/2230 [1:13:41<11:50:59, 21.05s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 203/2230 [1:13:41<11:50:59, 21.05s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0932, 'learning_rate': 0.00012059999999999999, 'epoch': 0.91} [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:21,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:21,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:24,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:27,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:27,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:31,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:31,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:34,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:37,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 204/2230 [1:14:07<12:43:56, 22.62s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 204/2230 [1:14:07<12:43:56, 22.62s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 204/2230 [1:14:07<12:43:56, 22.62s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6482, 'learning_rate': 0.00012179999999999999, 'epoch': 0.92} [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7276, 'learning_rate': 0.0001224, 'epoch': 0.92} Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6034, 'learning_rate': 0.00012299999999999998, 'epoch': 0.93} Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.58, 'learning_rate': 0.0001236, 'epoch': 0.93} Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4605, 'learning_rate': 0.00012419999999999998, 'epoch': 0.94} Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4878, 'learning_rate': 0.00012479999999999997, 'epoch': 0.94} Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.502, 'learning_rate': 0.00012539999999999999, 'epoch': 0.95} 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5168, 'learning_rate': 0.00012599999999999997, 'epoch': 0.95} 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5251, 'learning_rate': 0.0001266, 'epoch': 0.96} 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:29,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:29,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4259, 'learning_rate': 0.00012719999999999997, 'epoch': 0.96} [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3709, 'learning_rate': 0.0001278, 'epoch': 0.96} [WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5284, 'learning_rate': 0.00012839999999999998, 'epoch': 0.97} [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:50,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:50,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:50,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:56,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:30:56,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:59,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:30:59,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:31:04,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▎ | 218/2230 [1:19:33<11:39:47, 20.87s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▎ | 218/2230 [1:19:33<11:39:47, 20.87s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3609, 'learning_rate': 0.00012959999999999998, 'epoch': 0.98} [WARNING|modeling_utils.py:388] 2022-03-26 18:31:10,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:10,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:31:13,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:31:15,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:31:17,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:31:17,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:22,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:24,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:24,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:25,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:27,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:29,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:31,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:34,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:36,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:36,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:38,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:39,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:41,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:44,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:45,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:48,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:48,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:49,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:51,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:53,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:55,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:57,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:31:57,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:00,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:02,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:03,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:03,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:06,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:06,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:09,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:09,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:13,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:13,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:17,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:20,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:20,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:24,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:24,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:27,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:27,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:31,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:31,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:34,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:34,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:38,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:38,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:41,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:45,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:45,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:48,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:48,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:52,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:55,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:55,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.7454, 'learning_rate': 0.0001338, 'epoch': 1.01} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6168, 'learning_rate': 0.0001344, 'epoch': 1.01} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1562, 'learning_rate': 0.000135, 'epoch': 1.02} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9135, 'learning_rate': 0.0001356, 'epoch': 1.02} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6708, 'learning_rate': 0.0001362, 'epoch': 1.03} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5292, 'learning_rate': 0.0001368, 'epoch': 1.03} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4944, 'learning_rate': 0.0001374, 'epoch': 1.04} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3521, 'learning_rate': 0.000138, 'epoch': 1.04} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4591, 'learning_rate': 0.0001386, 'epoch': 1.04} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3587, 'learning_rate': 0.0001392, 'epoch': 1.05} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3937, 'learning_rate': 0.00013979999999999998, 'epoch': 1.05} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3795, 'learning_rate': 0.0001404, 'epoch': 1.06} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2489, 'learning_rate': 0.00014099999999999998, 'epoch': 1.06} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3007, 'learning_rate': 0.00014159999999999997, 'epoch': 1.07} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2813, 'learning_rate': 0.0001422, 'epoch': 1.07} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3177, 'learning_rate': 0.00014279999999999997, 'epoch': 1.08} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0955, 'learning_rate': 0.0001434, 'epoch': 1.08} [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1885, 'learning_rate': 0.00014399999999999998, 'epoch': 1.09} [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1224, 'learning_rate': 0.0001446, 'epoch': 1.09} [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2549, 'learning_rate': 0.00014519999999999998, 'epoch': 1.09} [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1249, 'learning_rate': 0.0001458, 'epoch': 1.1} [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1269, 'learning_rate': 0.00014639999999999998, 'epoch': 1.1} 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1725, 'learning_rate': 0.000147, 'epoch': 1.11} 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2151, 'learning_rate': 0.00014759999999999998, 'epoch': 1.11} 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1098, 'learning_rate': 0.0001482, 'epoch': 1.12} [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1609, 'learning_rate': 0.00014879999999999998, 'epoch': 1.12} [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0873, 'learning_rate': 0.0001494, 'epoch': 1.13} 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0377, 'learning_rate': 0.00015, 'epoch': 1.13} 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:35,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:35,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1351, 'learning_rate': 0.00015059999999999997, 'epoch': 1.13} [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0665, 'learning_rate': 0.0001512, 'epoch': 1.14} [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0937, 'learning_rate': 0.00015179999999999998, 'epoch': 1.14} [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0957, 'learning_rate': 0.00015299999999999998, 'epoch': 1.15} [WARNING|modeling_utils.py:388] 2022-03-26 18:46:11,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:11,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:15,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:15,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:15,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1556, 'learning_rate': 0.0001536, 'epoch': 1.16} [WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:46:33,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:46:33,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:46:33,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:49,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:49,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:49,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:46:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:08,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:08,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0407, 'learning_rate': 0.0001548, 'epoch': 1.17} [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:12,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:12,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:24,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:27,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:27,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:27,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:31,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:31,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:34,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:34,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:38,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:38,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:42,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:49,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:47:49,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:53,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:53,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:53,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:59,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:47:59,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:03,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:03,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|████████▊ | 263/2230 [1:36:33<10:42:01, 19.58s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:07,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:09,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:11,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:13,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:15,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:18,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:20,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:22,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:48:22,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0776, 'learning_rate': 0.0001572, 'epoch': 1.18} [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:25,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:27,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:29,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:31,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:33,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:35,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:35,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:37,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:39,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:41,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:43,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:45,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:50,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:52,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:52,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:54,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:56,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:57,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:48:59,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:02,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:04,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:04,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:05,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:07,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:10,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:12,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:13,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:16,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:16,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:17,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:20,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:22,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:24,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:25,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:29,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:29,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:30,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:32,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:34,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:36,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:38,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:38,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:40,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:42,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:44,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:46,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:46,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:48,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:51,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:52,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:52,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:54,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:57,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:59,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:59,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:49:59,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:03,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:03,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:06,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:06,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:10,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:10,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:13,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:21,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:21,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:24,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:24,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:28,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:28,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:31,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:31,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:35,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:35,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:38,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:41,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:41,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:45,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:45,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:48,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:52,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:52,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:55,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:55,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:50:55,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:00,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:00,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:03,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:03,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:07,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:10,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:10,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:13,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:17,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:17,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:20,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:20,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:20,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:24,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:27,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:27,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:30,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:30,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:34,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:37,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:37,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:40,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:40,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:44,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:47,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:47,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:47,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:51,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:54,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:54,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6105, 'learning_rate': 0.0001656, 'epoch': 1.25} [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6108, 'learning_rate': 0.0001662, 'epoch': 1.25} [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5733, 'learning_rate': 0.0001668, 'epoch': 1.26} [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4, 'learning_rate': 0.0001674, 'epoch': 1.26} [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4329, 'learning_rate': 0.000168, 'epoch': 1.26} [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3477, 'learning_rate': 0.0001686, 'epoch': 1.27} 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2902, 'learning_rate': 0.00016919999999999997, 'epoch': 1.27} 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3448, 'learning_rate': 0.00016979999999999998, 'epoch': 1.28} [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2629, 'learning_rate': 0.00017099999999999998, 'epoch': 1.29} 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2435, 'learning_rate': 0.00017159999999999997, 'epoch': 1.29} 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1274, 'learning_rate': 0.00017219999999999998, 'epoch': 1.3} 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.28, 'learning_rate': 0.00017279999999999997, 'epoch': 1.3} 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1554, 'learning_rate': 0.00017399999999999997, 'epoch': 1.31} 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1921, 'learning_rate': 0.00017459999999999996, 'epoch': 1.31} 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1944, 'learning_rate': 0.00017519999999999998, 'epoch': 1.32} 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1347, 'learning_rate': 0.00017579999999999996, 'epoch': 1.32} 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1154, 'learning_rate': 0.00017639999999999998, 'epoch': 1.33} 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.122, 'learning_rate': 0.00017759999999999998, 'epoch': 1.34} 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1743, 'learning_rate': 0.00017819999999999997, 'epoch': 1.34} 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1579, 'learning_rate': 0.00017879999999999998, 'epoch': 1.35} [WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1053, 'learning_rate': 0.00017939999999999997, 'epoch': 1.35} [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0905, 'learning_rate': 0.00017999999999999998, 'epoch': 1.35} [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1271, 'learning_rate': 0.00018059999999999997, 'epoch': 1.36} [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0824, 'learning_rate': 0.00018119999999999999, 'epoch': 1.36} [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0832, 'learning_rate': 0.00018179999999999997, 'epoch': 1.37} [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:40,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:40,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0989, 'learning_rate': 0.0001824, 'epoch': 1.37} [WARNING|modeling_utils.py:388] 2022-03-26 19:03:44,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:44,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:44,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.137, 'learning_rate': 0.00018299999999999998, 'epoch': 1.38} [WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1543, 'learning_rate': 0.0001836, 'epoch': 1.38} [WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:36,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:36,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:36,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:43,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:43,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:43,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:47,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:47,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:47,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:53,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:53,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:53,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:59,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:04:59,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0728, 'learning_rate': 0.0001848, 'epoch': 1.39} [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:10,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:10,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:10,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:16,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:16,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:19,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:22,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:22,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:22,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9376, 'learning_rate': 0.00018539999999999998, 'epoch': 1.39} [WARNING|modeling_utils.py:388] 2022-03-26 19:05:28,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:28,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:32,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:05:32,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:36,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:38,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:40,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:40,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:40,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8833, 'learning_rate': 0.000186, 'epoch': 1.4} [WARNING|modeling_utils.py:388] 2022-03-26 19:05:46,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:56,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:05:58,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:06:00,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:06:00,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0698, 'learning_rate': 0.00018659999999999998, 'epoch': 1.4} [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:04,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:07,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:09,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:11,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:13,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:15,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:17,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:17,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:19,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:21,469 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:23,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:25,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:27,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:29,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:31,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:31,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:33,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:35,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:36,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:38,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:40,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:42,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:45,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:47,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:47,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:49,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:51,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:52,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:56,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:57,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:59,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:06:59,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:02,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:04,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:05,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:07,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:11,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:11,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:14,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:15,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:18,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:19,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:20,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:24,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:24,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:25,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:30,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:31,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:33,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:33,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:36,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:38,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:40,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:42,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:42,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:44,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:46,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:48,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:48,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:50,504 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:52,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:54,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:54,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:55,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:55,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:07:59,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:02,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:02,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:06,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:06,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:09,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:13,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:13,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:17,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:17,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:20,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:20,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:20,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:24,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:24,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:27,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:31,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:31,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:34,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:38,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:38,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:41,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:41,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:44,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:44,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:48,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:51,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:51,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:51,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3192, 'learning_rate': 0.0001938, 'epoch': 1.46} [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:56,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:59,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:08:59,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:03,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:03,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:06,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:09,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:09,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:13,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:16,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:16,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:19,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:19,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0983, 'learning_rate': 0.00019439999999999998, 'epoch': 1.46} [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:23,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:23,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:26,744 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:33,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:36,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:36,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:40,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:40,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:43,504 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:46,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:46,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7318, 'learning_rate': 0.000195, 'epoch': 1.47} [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6187, 'learning_rate': 0.00019559999999999998, 'epoch': 1.47} [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4429, 'learning_rate': 0.0001962, 'epoch': 1.48} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3152, 'learning_rate': 0.00019679999999999999, 'epoch': 1.48} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4061, 'learning_rate': 0.0001974, 'epoch': 1.48} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3511, 'learning_rate': 0.000198, 'epoch': 1.49} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3352, 'learning_rate': 0.0001986, 'epoch': 1.49} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2468, 'learning_rate': 0.0001992, 'epoch': 1.5} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2837, 'learning_rate': 0.0001998, 'epoch': 1.5} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1831, 'learning_rate': 0.0002004, 'epoch': 1.51} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2178, 'learning_rate': 0.000201, 'epoch': 1.51} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1241, 'learning_rate': 0.0002016, 'epoch': 1.52} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1374, 'learning_rate': 0.0002022, 'epoch': 1.52} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0581, 'learning_rate': 0.0002028, 'epoch': 1.52} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1244, 'learning_rate': 0.00020339999999999998, 'epoch': 1.53} 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.101, 'learning_rate': 0.000204, 'epoch': 1.53} 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0908, 'learning_rate': 0.00020459999999999999, 'epoch': 1.54} 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9862, 'learning_rate': 0.0002052, 'epoch': 1.54} [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1008, 'learning_rate': 0.0002058, 'epoch': 1.55} 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0601, 'learning_rate': 0.00020639999999999998, 'epoch': 1.55} 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.063, 'learning_rate': 0.00020699999999999996, 'epoch': 1.56} 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0309, 'learning_rate': 0.00020759999999999998, 'epoch': 1.56} 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0944, 'learning_rate': 0.00020819999999999996, 'epoch': 1.57} 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0536, 'learning_rate': 0.00020879999999999998, 'epoch': 1.57} 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0323, 'learning_rate': 0.00020939999999999997, 'epoch': 1.57} [WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.041, 'learning_rate': 0.00020999999999999998, 'epoch': 1.58} [WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:18,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:18,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0926, 'learning_rate': 0.00021059999999999997, 'epoch': 1.58} 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.053, 'learning_rate': 0.00021119999999999996, 'epoch': 1.59} [WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:01,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:01,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9695, 'learning_rate': 0.00021179999999999997, 'epoch': 1.59} [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 357/2230 [2:10:28<11:33:25, 22.21s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 357/2230 [2:10:28<11:33:25, 22.21s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.978, 'learning_rate': 0.00021299999999999997, 'epoch': 1.6} 16%|████████████ | 357/2230 [2:10:28<11:33:25, 22.21s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:17,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:17,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 358/2230 [2:10:49<11:19:29, 21.78s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 358/2230 [2:10:49<11:19:29, 21.78s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9869, 'learning_rate': 0.00021359999999999996, 'epoch': 1.61} 16%|████████████ | 358/2230 [2:10:49<11:19:29, 21.78s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:36,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:36,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:36,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.983, 'learning_rate': 0.00021419999999999998, 'epoch': 1.61} 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:50,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:50,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:50,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:56,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:56,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:22:56,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 360/2230 [2:11:29<10:53:04, 20.95s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████ | 360/2230 [2:11:29<10:53:04, 20.95s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:04,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:04,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:08,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:08,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:12,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:12,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:12,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:18,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:18,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:18,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0257, 'learning_rate': 0.00021539999999999998, 'epoch': 1.62} [WARNING|modeling_utils.py:388] 2022-03-26 19:23:24,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:24,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:29,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:29,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:33,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:33,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:37,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:39,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:39,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9414, 'learning_rate': 0.00021599999999999996, 'epoch': 1.62} [WARNING|modeling_utils.py:388] 2022-03-26 19:23:43,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:45,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:45,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:23:45,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:52,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:52,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:52,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:57,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:23:57,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 363/2230 [2:12:27<10:12:12, 19.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 363/2230 [2:12:27<10:12:12, 19.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:03,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:05,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:07,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:09,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:11,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:14,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:14,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:24:14,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:17,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:19,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:21,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:23,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:25,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:27,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:29,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:31,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:31,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:33,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:35,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:37,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:39,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:42,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:44,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:46,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:46,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:47,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:49,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:51,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:54,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:56,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:57,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:57,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:24:59,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:02,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:04,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:05,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:07,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:10,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:10,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:11,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:14,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:15,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:18,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:19,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:22,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:22,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:24,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:26,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:27,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:30,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:32,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:32,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:34,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:36,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:38,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:40,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:40,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:42,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:45,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:46,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:46,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:48,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:50,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:52,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:52,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:53,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:56,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:25:56,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:00,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:00,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:04,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:07,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:07,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:11,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:11,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:14,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:14,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:18,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:21,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:21,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:21,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:25,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:25,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:28,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:32,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:32,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:35,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:35,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:38,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:38,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:42,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:45,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:45,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:49,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:49,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:49,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:53,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:53,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:26:57,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:00,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:00,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:03,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:03,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:07,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:10,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:10,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:14,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:17,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:17,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6441, 'learning_rate': 0.00022439999999999998, 'epoch': 1.69} [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:20,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:20,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:24,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:27,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:27,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:34,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:37,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:37,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:40,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:40,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:40,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:44,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:47,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:47,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3542, 'learning_rate': 0.00022559999999999998, 'epoch': 1.7} [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.231, 'learning_rate': 0.00022619999999999997, 'epoch': 1.7} [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2993, 'learning_rate': 0.00022679999999999998, 'epoch': 1.7} [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2949, 'learning_rate': 0.00022739999999999997, 'epoch': 1.71} [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2101, 'learning_rate': 0.00022799999999999999, 'epoch': 1.71} [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3218, 'learning_rate': 0.00022859999999999997, 'epoch': 1.72} [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1069, 'learning_rate': 0.0002292, 'epoch': 1.72} [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0308, 'learning_rate': 0.00022979999999999997, 'epoch': 1.73} [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0998, 'learning_rate': 0.0002304, 'epoch': 1.73} [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0487, 'learning_rate': 0.00023099999999999998, 'epoch': 1.74} [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0931, 'learning_rate': 0.0002316, 'epoch': 1.74} 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0471, 'learning_rate': 0.00023219999999999998, 'epoch': 1.74} 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0103, 'learning_rate': 0.0002328, 'epoch': 1.75} 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0283, 'learning_rate': 0.00023339999999999998, 'epoch': 1.75} 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0728, 'learning_rate': 0.000234, 'epoch': 1.76} 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0685, 'learning_rate': 0.00023459999999999998, 'epoch': 1.76} 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0599, 'learning_rate': 0.0002352, 'epoch': 1.77} 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0421, 'learning_rate': 0.00023579999999999999, 'epoch': 1.77} [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▎ | 396/2230 [2:24:15<12:25:51, 24.40s/it] Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▎ | 396/2230 [2:24:15<12:25:51, 24.40s/it] Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9362, 'learning_rate': 0.0002364, 'epoch': 1.78} [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.08, 'learning_rate': 0.000237, 'epoch': 1.78} [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0144, 'learning_rate': 0.0002376, 'epoch': 1.78} [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0673, 'learning_rate': 0.0002382, 'epoch': 1.79} [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0356, 'learning_rate': 0.0002388, 'epoch': 1.79} [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9756, 'learning_rate': 0.0002394, 'epoch': 1.8} [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9296, 'learning_rate': 0.00023999999999999998, 'epoch': 1.8} [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:29,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:29,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9861, 'learning_rate': 0.0002406, 'epoch': 1.81} [WARNING|modeling_utils.py:388] 2022-03-26 19:38:29,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0462, 'learning_rate': 0.00024119999999999998, 'epoch': 1.81} [WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.82, 'learning_rate': 0.0002418, 'epoch': 1.82} [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9398, 'learning_rate': 0.00024239999999999998, 'epoch': 1.82} [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:47,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:47,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9655, 'learning_rate': 0.000243, 'epoch': 1.83} [WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:05,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:05,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:05,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:12,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:12,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8895, 'learning_rate': 0.00024359999999999999, 'epoch': 1.83} [WARNING|modeling_utils.py:388] 2022-03-26 19:40:23,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:23,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0266, 'learning_rate': 0.00024419999999999997, 'epoch': 1.83} [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:40:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:58,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:58,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:40:58,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:02,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:02,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:02,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:08,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:08,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:13,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:13,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9576, 'learning_rate': 0.00024539999999999995, 'epoch': 1.84} [WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:25,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:25,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:29,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:31,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:31,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:35,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:35,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 412/2230 [2:30:05<10:07:30, 20.05s/it] Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:39,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:42,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:41:42,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:46,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:46,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:53,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:56,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:41:56,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9656, 'learning_rate': 0.0002466, 'epoch': 1.85} [WARNING|modeling_utils.py:388] 2022-03-26 19:41:59,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:42:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:42:04,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:42:06,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:42:06,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:10,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:12,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:12,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [2:30:41<9:34:30, 18.98s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:16,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:18,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:20,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:22,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:24,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:26,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:27,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:27,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▏ | 415/2230 [2:30:57<9:03:06, 17.95s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:31,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:33,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:35,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:37,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:40,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:42,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▏ | 416/2230 [2:31:11<8:33:00, 16.97s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▏ | 416/2230 [2:31:11<8:33:00, 16.97s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:46,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:47,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:49,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:53,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:54,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:56,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:42:56,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▏ | 417/2230 [2:31:25<8:00:47, 15.91s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:01,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:02,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:04,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:07,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:08,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:08,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:11,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:13,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:14,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:17,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:19,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:23,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:24,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:26,729 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:29,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [2:31:59<6:20:25, 12.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [2:31:59<6:20:25, 12.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:33,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:34,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:36,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 421/2230 [2:32:07<5:41:46, 11.34s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 421/2230 [2:32:07<5:41:46, 11.34s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:41,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:39,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:43,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:39,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:45,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:39,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 422/2230 [2:32:14<5:04:23, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 422/2230 [2:32:14<5:04:23, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:49,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:51,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 423/2230 [2:32:20<4:27:26, 8.88s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 423/2230 [2:32:20<4:27:26, 8.88s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 423/2230 [2:32:20<4:27:26, 8.88s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:57,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:43:57,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:00,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:00,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:04,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:04,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:07,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:11,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:11,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:14,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:18,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:18,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:18,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [2:32:48<7:19:36, 14.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [2:32:48<7:19:36, 14.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:25,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:28,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:28,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:31,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:31,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:35,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:35,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:38,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:42,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:42,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:45,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it] Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it] Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:53,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:56,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:44:56,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:00,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:03,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:03,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:10,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:10,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:13,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:13,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 426/2230 [2:33:43<10:36:06, 21.16s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 426/2230 [2:33:43<10:36:06, 21.16s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:20,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:23,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:23,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:26,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:29,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:29,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:33,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:36,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:36,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:39,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:46,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4286, 'learning_rate': 0.0002556, 'epoch': 1.92} [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3649, 'learning_rate': 0.0002562, 'epoch': 1.92} [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.143, 'learning_rate': 0.00025679999999999995, 'epoch': 1.93} [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1339, 'learning_rate': 0.00025739999999999997, 'epoch': 1.93} [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9971, 'learning_rate': 0.0002586, 'epoch': 1.94} 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0285, 'learning_rate': 0.00025919999999999996, 'epoch': 1.95} 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9439, 'learning_rate': 0.000261, 'epoch': 1.96} [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:50:04,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:50:04,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.91, 'learning_rate': 0.00026159999999999996, 'epoch': 1.96} 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:29,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:29,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.819, 'learning_rate': 0.0002622, 'epoch': 1.97} [WARNING|modeling_utils.py:388] 2022-03-26 19:50:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:50:47,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▊ | 440/2230 [2:39:17<10:52:42, 21.88s/it] Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▊ | 440/2230 [2:39:17<10:52:42, 21.88s/it] Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8625, 'learning_rate': 0.0002628, 'epoch': 1.97} 20%|██████████████▊ | 440/2230 [2:39:17<10:52:42, 21.88s/it] Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:55,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:58,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:58,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:50:58,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:51:04,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:51:04,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:08,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:08,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:08,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:51:12,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:51:14,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:51:16,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:51:18,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:51:18,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:22,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:24,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:24,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████ | 442/2230 [2:39:54<9:54:25, 19.95s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:28,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:30,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:32,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:34,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:36,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:37,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████ | 443/2230 [2:40:08<9:08:05, 18.40s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████ | 443/2230 [2:40:08<9:08:05, 18.40s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:43,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:44,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:46,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:49,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:51,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 444/2230 [2:40:21<8:17:32, 16.71s/it] Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 444/2230 [2:40:21<8:17:32, 16.71s/it] Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:55,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:56,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:51:58,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 445/2230 [2:40:30<7:10:17, 14.46s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:05,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:06,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:09,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:09,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 446/2230 [2:40:37<6:00:53, 12.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 446/2230 [2:40:37<6:00:53, 12.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:14,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:14,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:18,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:21,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:21,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:25,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:25,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:28,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:28,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:32,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:35,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it] Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it] Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:42,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:42,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:46,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:49,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:49,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:53,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:56,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:52:56,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3123, 'learning_rate': 0.0002676, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9094, 'learning_rate': 0.00026819999999999996, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4246, 'learning_rate': 0.0002688, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3338, 'learning_rate': 0.0002694, 'epoch': 2.02} [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0653, 'learning_rate': 0.00027, 'epoch': 2.03} [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9293, 'learning_rate': 0.00027059999999999996, 'epoch': 2.03} [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.924, 'learning_rate': 0.0002712, 'epoch': 2.04} [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8679, 'learning_rate': 0.0002718, 'epoch': 2.04} [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8316, 'learning_rate': 0.0002724, 'epoch': 2.04} [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6439, 'learning_rate': 0.00027299999999999997, 'epoch': 2.05} [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6644, 'learning_rate': 0.0002736, 'epoch': 2.05} [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4917, 'learning_rate': 0.0002742, 'epoch': 2.06} 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5827, 'learning_rate': 0.0002748, 'epoch': 2.06} 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5428, 'learning_rate': 0.00027539999999999997, 'epoch': 2.07} 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5134, 'learning_rate': 0.000276, 'epoch': 2.07} 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5094, 'learning_rate': 0.0002766, 'epoch': 2.08} 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4225, 'learning_rate': 0.0002772, 'epoch': 2.08} 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4751, 'learning_rate': 0.0002778, 'epoch': 2.09} 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3967, 'learning_rate': 0.0002784, 'epoch': 2.09} 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3292, 'learning_rate': 0.000279, 'epoch': 2.09} [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3373, 'learning_rate': 0.00027959999999999997, 'epoch': 2.1} [WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3579, 'learning_rate': 0.0002802, 'epoch': 2.1} [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3337, 'learning_rate': 0.0002808, 'epoch': 2.11} 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3649, 'learning_rate': 0.00028139999999999996, 'epoch': 2.11} 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2899, 'learning_rate': 0.00028199999999999997, 'epoch': 2.12} 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2992, 'learning_rate': 0.0002826, 'epoch': 2.12} 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.197, 'learning_rate': 0.00028319999999999994, 'epoch': 2.13} [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2116, 'learning_rate': 0.00028379999999999996, 'epoch': 2.13} [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.251, 'learning_rate': 0.0002844, 'epoch': 2.13} [WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:49,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:49,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:02,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:02,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:06,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:06,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2578, 'learning_rate': 0.000285, 'epoch': 2.14} [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2191, 'learning_rate': 0.00028559999999999995, 'epoch': 2.14} [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1944, 'learning_rate': 0.00028619999999999996, 'epoch': 2.15} [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2094, 'learning_rate': 0.0002868, 'epoch': 2.15} [WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1629, 'learning_rate': 0.00028739999999999994, 'epoch': 2.16} [WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:50,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:06:50,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.114, 'learning_rate': 0.00028799999999999995, 'epoch': 2.16} [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:02,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:02,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:02,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:08,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:11,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:11,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1973, 'learning_rate': 0.00028859999999999997, 'epoch': 2.17} [WARNING|modeling_utils.py:388] 2022-03-26 20:07:11,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:17,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:17,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:17,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:23,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:25,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:25,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:29,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:29,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0682, 'learning_rate': 0.0002892, 'epoch': 2.17} [WARNING|modeling_utils.py:388] 2022-03-26 20:07:33,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:33,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:37,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:37,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:41,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:43,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:43,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:47,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:47,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:50,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:07:50,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:56,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:58,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:07:58,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:08:02,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:08:04,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 486/2230 [2:56:34<9:13:51, 19.05s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 486/2230 [2:56:34<9:13:51, 19.05s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:08,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:10,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:12,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:14,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:16,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:18,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:20,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:22,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:22,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:24,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:26,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:28,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:28,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:28,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:34,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:36,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:38,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:38,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:40,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:41,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:43,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:45,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:47,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:50,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:52,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:54,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:54,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:56,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:08:57,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:01,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:02,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:04,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:05,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:05,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:09,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:10,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:12,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:15,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:16,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:19,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:19,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:20,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:23,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:25,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:26,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:28,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:28,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:31,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:32,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:35,066 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:37,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:38,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:38,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:40,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:42,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:44,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:47,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:51,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:53,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:55,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:55,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:57,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:09:59,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:01,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:01,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:03,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:03,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:07,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:07,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:10,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:10,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:14,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:17,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:17,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:21,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:21,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:24,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:28,269 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:28,269 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.3452, 'learning_rate': 0.00029699999999999996, 'epoch': 2.23} [WARNING|modeling_utils.py:388] 2022-03-26 20:10:31,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:31,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:35,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:35,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:38,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:42,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:42,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:45,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:45,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:49,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:52,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:52,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:56,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:10:56,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.654, 'learning_rate': 0.00029759999999999997, 'epoch': 2.23} [WARNING|modeling_utils.py:388] 2022-03-26 20:10:59,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:03,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:03,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:06,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:06,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:09,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:13,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:13,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:16,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:16,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:20,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:23,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:23,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:23,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:26,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:26,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:30,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:33,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:33,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:36,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:36,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:40,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:43,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:43,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:46,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:50,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:11:50,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%| | 0/331 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 0%| | 0/331 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▌ | 2/331 [00:01<03:27, 1.59it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 20:21:39 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 3.9967808723449707, 'eval_wer': 1.6151527171757238, 'eval_runtime': 586.6262, 'eval_samples_per_second': 4.504, 'eval_steps_per_second': 0.564, 'epoch': 2.24} 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 20:22:58 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb']. This may take a bit of time if the files are large. 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.117, 'learning_rate': 0.00029939999999999996, 'epoch': 2.25} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9755, 'learning_rate': 0.0003, 'epoch': 2.25} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8336, 'learning_rate': 0.00029982658959537567, 'epoch': 2.26} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8994, 'learning_rate': 0.0002996531791907514, 'epoch': 2.26} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.765, 'learning_rate': 0.00029947976878612716, 'epoch': 2.26} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.677, 'learning_rate': 0.00029930635838150286, 'epoch': 2.27} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6203, 'learning_rate': 0.0002991329479768786, 'epoch': 2.27} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6362, 'learning_rate': 0.0002989595375722543, 'epoch': 2.28} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5662, 'learning_rate': 0.00029878612716763005, 'epoch': 2.28} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4386, 'learning_rate': 0.00029861271676300574, 'epoch': 2.29} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4955, 'learning_rate': 0.0002984393063583815, 'epoch': 2.29} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.463, 'learning_rate': 0.0002982658959537572, 'epoch': 2.3} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4607, 'learning_rate': 0.00029809248554913293, 'epoch': 2.3} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4455, 'learning_rate': 0.0002979190751445086, 'epoch': 2.3} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4025, 'learning_rate': 0.00029774566473988437, 'epoch': 2.31} 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 501/2230 [3:12:17<111:06:30, 231.34s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2991, 'learning_rate': 0.00029757225433526006, 'epoch': 2.31} 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 516/2230 [3:18:51<12:20:00, 25.90s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2948, 'learning_rate': 0.0002973988439306358, 'epoch': 2.32} 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 517/2230 [3:19:15<12:05:58, 25.43s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3225, 'learning_rate': 0.00029722543352601156, 'epoch': 2.32} [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1961, 'learning_rate': 0.00029705202312138725, 'epoch': 2.33} [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2374, 'learning_rate': 0.00029687861271676295, 'epoch': 2.33} [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2768, 'learning_rate': 0.0002967052023121387, 'epoch': 2.34} [WARNING|modeling_bart.py:1051] 2022-03-26 20:31:06,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:32:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:32:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:32:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:32:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:32:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:32:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3283, 'learning_rate': 0.00029653179190751444, 'epoch': 2.34} [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:32:43,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:33:10,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:33:10,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2414, 'learning_rate': 0.00029635838150289014, 'epoch': 2.35} [WARNING|modeling_bart.py:1051] 2022-03-26 20:33:10,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:33:10,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1435, 'learning_rate': 0.00029618497109826583, 'epoch': 2.35} [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1363, 'learning_rate': 0.0002960115606936416, 'epoch': 2.35} [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0984, 'learning_rate': 0.0002958381502890173, 'epoch': 2.36} [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:33:17,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1239, 'learning_rate': 0.000295664739884393, 'epoch': 2.36} [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:39,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:56,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:34:56,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:00,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:00,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9329, 'learning_rate': 0.00029549132947976877, 'epoch': 2.37} [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0781, 'learning_rate': 0.0002953179190751445, 'epoch': 2.37} 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▊ | 529/2230 [3:23:54<10:39:39, 22.56s/it]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:45,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:45,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:45,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9579, 'learning_rate': 0.0002951445086705202, 'epoch': 2.38} [WARNING|modeling_utils.py:388] 2022-03-26 20:35:45,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:45,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:45,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:57,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:35:57,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:01,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:01,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0964, 'learning_rate': 0.0002949710982658959, 'epoch': 2.38} [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:05,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:22,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:22,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:22,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:22,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0365, 'learning_rate': 0.00029479768786127165, 'epoch': 2.39} [WARNING|modeling_utils.py:388] 2022-03-26 20:36:34,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:34,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:34,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:34,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:42,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:42,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:42,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:36:42,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:50,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:50,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.962, 'learning_rate': 0.0002946242774566474, 'epoch': 2.39} [WARNING|modeling_utils.py:388] 2022-03-26 20:36:50,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:57,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:57,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:36:57,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:03,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:05,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:05,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:05,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:05,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:05,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:13,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:13,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:13,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:19,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:19,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:23,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:25,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:25,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:25,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▏ | 535/2230 [3:25:57<9:29:04, 20.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▏ | 535/2230 [3:25:57<9:29:04, 20.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:33,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:35,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:35,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:39,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:39,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:43,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:45,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:45,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:48,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:37:48,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:52,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:54,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:56,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:37:58,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:38:00,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:38:02,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:37:29,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▎ | 537/2230 [3:26:31<8:48:16, 18.72s/it][WARNING|modeling_bart.py:1051] 2022-03-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▎ | 537/2230 [3:26:31<8:48:16, 18.72s/it][WARNING|modeling_bart.py:1051] 2022-03-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:38:06,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:38:08,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:38:10,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:38:10,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:16,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:18,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:20,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:20,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:22,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:24,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:26,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:28,266 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:30,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:31,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:33,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:33,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:37,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:39,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:40,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:42,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:44,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:47,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:48,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:48,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:50,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:53,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:55,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:56,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:38:59,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:01,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:01,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:03,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:05,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:07,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:09,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:11,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:11,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:13,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:15,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:17,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:20,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:22,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:22,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:24,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:26,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:27,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:29,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:29,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:31,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:33,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:36,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:37,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:37,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:39,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:41,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:43,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:43,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.6539, 'learning_rate': 0.0002923699421965318, 'epoch': 2.45} [WARNING|modeling_utils.py:388] 2022-03-26 20:39:47,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:47,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:50,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:54,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:54,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:57,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:39:57,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:01,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:04,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:04,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:08,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:08,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:08,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:11,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:11,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:15,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:15,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:19,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:22,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:22,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:26,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:29,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:29,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:32,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:32,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:36,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:36,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:36,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:39,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:43,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:43,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:46,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:46,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:50,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:53,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:53,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:57,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:40:57,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:00,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:04,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:04,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:04,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:07,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:07,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:10,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:14,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:14,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:17,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:17,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:20,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:24,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:24,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:27,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:27,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:31,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:31,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:31,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:35,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2709, 'learning_rate': 0.00029150289017341037, 'epoch': 2.47} [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9271, 'learning_rate': 0.0002913294797687861, 'epoch': 2.48} [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7921, 'learning_rate': 0.00029115606936416186, 'epoch': 2.48} [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7069, 'learning_rate': 0.00029098265895953756, 'epoch': 2.48} [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6785, 'learning_rate': 0.00029080924855491325, 'epoch': 2.49} [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:41:38,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6915, 'learning_rate': 0.000290635838150289, 'epoch': 2.49} 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4802, 'learning_rate': 0.00029046242774566475, 'epoch': 2.5} 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.393, 'learning_rate': 0.00029028901734104044, 'epoch': 2.5} 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▋ | 556/2230 [3:32:42<12:05:28, 26.00s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4273, 'learning_rate': 0.00029011560693641613, 'epoch': 2.51} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2599, 'learning_rate': 0.0002899421965317919, 'epoch': 2.51} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2463, 'learning_rate': 0.00028976878612716763, 'epoch': 2.52} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1323, 'learning_rate': 0.0002895953757225433, 'epoch': 2.52} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1298, 'learning_rate': 0.00028942196531791907, 'epoch': 2.52} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0224, 'learning_rate': 0.00028924855491329476, 'epoch': 2.53} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0445, 'learning_rate': 0.0002890751445086705, 'epoch': 2.53} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9297, 'learning_rate': 0.0002889017341040462, 'epoch': 2.54} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8817, 'learning_rate': 0.00028872832369942195, 'epoch': 2.54} 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▊ | 559/2230 [3:34:00<11:59:50, 25.85s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8181, 'learning_rate': 0.00028855491329479765, 'epoch': 2.55} 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6842, 'learning_rate': 0.0002883815028901734, 'epoch': 2.55} 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6766, 'learning_rate': 0.0002882080924855491, 'epoch': 2.56} 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████ | 568/2230 [3:37:43<11:19:36, 24.53s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6896, 'learning_rate': 0.00028803468208092484, 'epoch': 2.56} [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:23,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6096, 'learning_rate': 0.00028786127167630053, 'epoch': 2.57} [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:50:43,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:04,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:04,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:04,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:04,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:04,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 573/2230 [3:39:42<10:54:34, 23.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 573/2230 [3:39:42<10:54:34, 23.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5341, 'learning_rate': 0.0002876878612716763, 'epoch': 2.57} [WARNING|modeling_utils.py:388] 2022-03-26 20:51:18,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:18,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:18,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:18,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:18,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:18,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:51:31,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:51:31,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:51:31,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:51:31,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:51:31,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:51:31,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.4034, 'learning_rate': 0.000287514450867052, 'epoch': 2.57} [WARNING|modeling_utils.py:388] 2022-03-26 20:51:43,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:43,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.4183, 'learning_rate': 0.0002873410404624277, 'epoch': 2.58} [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:51:47,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:52:20,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:52:20,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:52:20,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▎ | 576/2230 [3:40:51<10:41:21, 23.27s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:40,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:40,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:44,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:44,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:44,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:48,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:48,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:48,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:48,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:57,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:57,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:57,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:52:57,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5114, 'learning_rate': 0.0002868208092485549, 'epoch': 2.59} [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:05,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2027, 'learning_rate': 0.0002866473988439306, 'epoch': 2.6} [WARNING|modeling_utils.py:388] 2022-03-26 20:53:33,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:33,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:37,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:37,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:37,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:37,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:37,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.546, 'learning_rate': 0.00028647398843930635, 'epoch': 2.6} [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:53:48,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2915, 'learning_rate': 0.0002863005780346821, 'epoch': 2.61} [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:06,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:29,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:29,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▊ | 582/2230 [3:43:01<9:52:29, 21.57s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▊ | 582/2230 [3:43:01<9:52:29, 21.57s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.4333, 'learning_rate': 0.0002861271676300578, 'epoch': 2.61} 26%|███████████████████▊ | 582/2230 [3:43:01<9:52:29, 21.57s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:39,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:39,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:39,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:45,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:45,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:45,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:52,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:52,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:54:52,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.3528, 'learning_rate': 0.0002859537572254335, 'epoch': 2.61} [WARNING|modeling_utils.py:388] 2022-03-26 20:54:52,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:00,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:00,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:04,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:04,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:04,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:10,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:12,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:12,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.0364, 'learning_rate': 0.00028578034682080923, 'epoch': 2.62} [WARNING|modeling_utils.py:388] 2022-03-26 20:55:12,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:18,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:20,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:23,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:23,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:23,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:28,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:31,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:31,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:31,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9785, 'learning_rate': 0.000285606936416185, 'epoch': 2.62} [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:36,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:39,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:41,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:43,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:45,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 20:55:47,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▉ | 586/2230 [3:44:17<8:51:58, 19.42s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▉ | 586/2230 [3:44:17<8:51:58, 19.42s/it] Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:51,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:55,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:55:58,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:00,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:02,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:04,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:06,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:06,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:08,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:10,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:12,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:12,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:12,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:18,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:19,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:21,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:21,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:23,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:25,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:27,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:29,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:31,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:32,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:36,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:36,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:38,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:39,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:41,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:43,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:46,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:48,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:49,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:49,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:52,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:54,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:55,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:56:58,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:00,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:03,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:03,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:04,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:07,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:08,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:10,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:12,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:12,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:14,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:16,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:18,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:20,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:23,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:23,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:24,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:27,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:28,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:31,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:31,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:33,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:34,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:36,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:38,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:38,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:41,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:43,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:44,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:44,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:47,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:47,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:50,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:50,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:54,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:54,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:57:57,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:01,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:01,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:05,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:05,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:08,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:08,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:12,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:12,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:15,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:15,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:19,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:19,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:22,655 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:26,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:26,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:29,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:29,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:33,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:36,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:36,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:39,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:39,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1178, 'learning_rate': 0.0002833526011560693, 'epoch': 2.68} [WARNING|modeling_utils.py:388] 2022-03-26 20:58:43,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:43,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:47,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:50,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:50,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:53,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:53,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:58:57,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:00,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:00,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:04,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:04,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:07,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:07,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:11,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:11,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:14,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:14,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:17,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:21,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:21,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:24,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:24,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:27,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:31,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:31,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:34,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:34,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:34,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 20:59:39,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0486, 'learning_rate': 0.0002828323699421965, 'epoch': 2.7} 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8205, 'learning_rate': 0.00028265895953757226, 'epoch': 2.7} 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6073, 'learning_rate': 0.00028248554913294795, 'epoch': 2.7} 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.476, 'learning_rate': 0.00028231213872832365, 'epoch': 2.71} 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 601/2230 [3:48:32<11:05:03, 24.50s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2104, 'learning_rate': 0.00028196531791907514, 'epoch': 2.72} 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.0377, 'learning_rate': 0.00028179190751445083, 'epoch': 2.72} 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9675, 'learning_rate': 0.0002816184971098266, 'epoch': 2.73} 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7629, 'learning_rate': 0.0002814450867052023, 'epoch': 2.73} 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7966, 'learning_rate': 0.000281271676300578, 'epoch': 2.74} 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.6484, 'learning_rate': 0.0002810982658959537, 'epoch': 2.74} 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▎ | 605/2230 [3:50:18<11:43:47, 25.99s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5786, 'learning_rate': 0.00028092485549132947, 'epoch': 2.74} 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5999, 'learning_rate': 0.00028075144508670516, 'epoch': 2.75} 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5781, 'learning_rate': 0.0002805780346820809, 'epoch': 2.75} 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4, 'learning_rate': 0.0002804046242774566, 'epoch': 2.76} 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4785, 'learning_rate': 0.00028023121387283235, 'epoch': 2.76} 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.3169, 'learning_rate': 0.00028005780346820804, 'epoch': 2.77} 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 612/2230 [3:53:17<11:23:11, 25.33s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.3737, 'learning_rate': 0.0002798843930635838, 'epoch': 2.77} 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 618/2230 [3:55:45<10:58:45, 24.52s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.3111, 'learning_rate': 0.00027971098265895954, 'epoch': 2.78} 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2543, 'learning_rate': 0.00027953757225433523, 'epoch': 2.78} 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1839, 'learning_rate': 0.0002793641618497109, 'epoch': 2.78} 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▊ | 619/2230 [3:56:10<11:01:34, 24.64s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:08:41,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:08:41,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:08:41,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:08:41,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:08:41,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:08:41,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1236, 'learning_rate': 0.00027919075144508667, 'epoch': 2.79} 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1345, 'learning_rate': 0.0002790173410404624, 'epoch': 2.79} 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 622/2230 [3:57:21<10:40:57, 23.92s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:09:43,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:09:59,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:09:59,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:09:59,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:09:59,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0724, 'learning_rate': 0.00027867052023121386, 'epoch': 2.8} [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:10:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0625, 'learning_rate': 0.00027849710982658955, 'epoch': 2.81} 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 626/2230 [3:58:53<10:21:45, 23.26s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 627/2230 [3:59:16<10:13:59, 22.98s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 628/2230 [3:59:38<10:06:13, 22.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 628/2230 [3:59:38<10:06:13, 22.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 628/2230 [3:59:38<10:06:13, 22.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 628/2230 [3:59:38<10:06:13, 22.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 628/2230 [3:59:38<10:06:13, 22.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 628/2230 [3:59:38<10:06:13, 22.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 628/2230 [3:59:38<10:06:13, 22.70s/it]g-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:25,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:25,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:29,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:29,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:29,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:33,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:33,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:11:37,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 20:38:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▍ | 630/2230 [4:00:21<9:48:20, 22.06s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▍ | 630/2230 [4:00:21<9:48:20, 22.06s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9573, 'learning_rate': 0.0002778034682080925, 'epoch': 2.83} 28%|█████████████████████▍ | 630/2230 [4:00:21<9:48:20, 22.06s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▍ | 630/2230 [4:00:21<9:48:20, 22.06s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:12:01,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:12:01,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:12:01,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:12:01,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:12:01,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:12:12,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 631/2230 [4:00:41<9:37:34, 21.67s/it] Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 631/2230 [4:00:41<9:37:34, 21.67s/it] Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9805, 'learning_rate': 0.0002776300578034682, 'epoch': 2.83} 28%|█████████████████████▌ | 631/2230 [4:00:41<9:37:34, 21.67s/it] Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 631/2230 [4:00:41<9:37:34, 21.67s/it] Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 631/2230 [4:00:41<9:37:34, 21.67s/it] Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 631/2230 [4:00:41<9:37:34, 21.67s/it] Setting `use_cache=False`...1] 2022-03-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:26,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:26,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:26,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:32,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:32,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:32,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:32,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8767, 'learning_rate': 0.0002774566473988439, 'epoch': 2.83} [WARNING|modeling_utils.py:388] 2022-03-26 21:12:32,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:42,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:42,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:42,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:48,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:48,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:48,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:55,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:12:55,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8657, 'learning_rate': 0.0002772832369942196, 'epoch': 2.84} [WARNING|modeling_utils.py:388] 2022-03-26 21:12:55,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:01,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:01,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:05,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:05,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:05,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:11,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:11,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:11,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:15,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:15,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:15,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:21,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:23,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:23,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:23,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:29,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:32,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:32,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8549, 'learning_rate': 0.00027693641618497107, 'epoch': 2.85} [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:36,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:36,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:40,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:42,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:42,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:46,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:46,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:46,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:46,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:11:54,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▋ | 636/2230 [4:02:19<8:37:27, 19.48s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:54,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:13:54,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:13:58,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:00,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:02,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:04,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:06,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:08,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:08,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:10,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:12,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:14,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:14,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:14,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:20,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:22,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:24,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:24,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:26,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:28,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:30,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:32,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:34,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:37,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:37,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:39,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:41,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:44,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:46,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:48,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:49,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:54,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:56,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:14:58,047 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:01,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:02,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:05,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:05,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:07,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:08,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:11,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:12,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:15,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:16,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:16,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:19,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:20,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:22,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:25,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:25,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:27,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:29,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:31,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:32,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:35,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:35,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:37,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:39,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:41,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:41,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:43,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:45,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:47,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:51,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:51,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:55,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:55,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:15:58,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:02,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:02,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:05,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:05,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:09,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:12,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:12,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:16,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:16,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:16,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:19,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:19,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:22,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:26,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:26,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:29,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:33,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:33,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:36,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:36,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:39,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:39,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:43,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:43,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:46,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:46,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:49,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:53,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:53,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:56,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:16:56,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:00,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:03,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:03,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:06,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:06,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:13,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:13,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:16,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:19,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:19,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:23,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:26,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:26,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:29,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:29,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:32,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:36,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:36,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:36,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:40,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:40,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5995, 'learning_rate': 0.00027416184971098265, 'epoch': 2.92} [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1182, 'learning_rate': 0.00027398843930635835, 'epoch': 2.92} [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9193, 'learning_rate': 0.0002738150289017341, 'epoch': 2.93} [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:17:43,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7254, 'learning_rate': 0.00027364161849710984, 'epoch': 2.93} 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 654/2230 [4:07:48<10:47:34, 24.65s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.513, 'learning_rate': 0.00027346820809248554, 'epoch': 2.94} 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.3363, 'learning_rate': 0.00027329479768786123, 'epoch': 2.94} 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 655/2230 [4:08:13<10:46:19, 24.62s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1537, 'learning_rate': 0.000273121387283237, 'epoch': 2.95} 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1289, 'learning_rate': 0.0002729479768786127, 'epoch': 2.95} 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 657/2230 [4:09:02<10:45:52, 24.64s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0504, 'learning_rate': 0.0002727745664739884, 'epoch': 2.96} 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 659/2230 [4:09:49<10:29:37, 24.05s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 660/2230 [4:10:12<10:19:36, 23.68s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 660/2230 [4:10:12<10:19:36, 23.68s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0136, 'learning_rate': 0.0002726011560693641, 'epoch': 2.96} 30%|██████████████████████▏ | 660/2230 [4:10:12<10:19:36, 23.68s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 660/2230 [4:10:12<10:19:36, 23.68s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:21:52,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:21:52,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:21:52,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:21:52,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:21:52,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:21:52,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:21:52,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9461, 'learning_rate': 0.00027242774566473986, 'epoch': 2.96} 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▏ | 661/2230 [4:10:34<10:08:11, 23.26s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:26,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:26,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:26,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9157, 'learning_rate': 0.0002722543352601156, 'epoch': 2.97} [WARNING|modeling_utils.py:388] 2022-03-26 21:22:26,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:26,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:36,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:36,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:36,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:43,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:43,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:22:47,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▌ | 663/2230 [4:11:17<9:43:29, 22.34s/it] Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▌ | 663/2230 [4:11:17<9:43:29, 22.34s/it] Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7936, 'learning_rate': 0.0002720809248554913, 'epoch': 2.97} [WARNING|modeling_utils.py:388] 2022-03-26 21:22:53,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:53,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:53,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:59,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:22:59,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:03,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:03,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:07,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:07,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:07,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:12,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:12,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:15,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:18,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:18,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:22,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:24,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:26,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:26,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:23:26,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:30,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:32,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:33,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:35,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:39,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:41,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:41,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:43,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:44,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:48,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:49,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:51,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:54,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:54,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:55,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:23:57,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:00,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:02,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:02,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:04,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:06,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:08,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:10,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:10,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:11,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:13,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:13,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:17,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:17,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:20,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:24,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:24,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:27,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:27,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:31,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:34,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:34,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:38,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:38,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6436, 'learning_rate': 0.0002708670520231214, 'epoch': 3.0} [WARNING|modeling_utils.py:388] 2022-03-26 21:24:41,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:41,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:45,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:48,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:48,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:52,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:52,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:55,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0477, 'learning_rate': 0.00027069364161849707, 'epoch': 3.01} [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9186, 'learning_rate': 0.0002705202312138728, 'epoch': 3.01} [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7018, 'learning_rate': 0.0002703468208092485, 'epoch': 3.02} [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4485, 'learning_rate': 0.00027017341040462426, 'epoch': 3.02} [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1206, 'learning_rate': 0.00027, 'epoch': 3.03} [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:24:59,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1319, 'learning_rate': 0.0002698265895953757, 'epoch': 3.03} 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0149, 'learning_rate': 0.0002696531791907514, 'epoch': 3.04} 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.809, 'learning_rate': 0.00026947976878612714, 'epoch': 3.04} 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8025, 'learning_rate': 0.0002693063583815029, 'epoch': 3.04} 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▋ | 676/2230 [4:15:50<11:08:30, 25.81s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8262, 'learning_rate': 0.0002691329479768786, 'epoch': 3.05} 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6764, 'learning_rate': 0.00026895953757225433, 'epoch': 3.05} 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7046, 'learning_rate': 0.0002687861271676301, 'epoch': 3.06} 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 680/2230 [4:17:34<11:09:45, 25.93s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6042, 'learning_rate': 0.00026861271676300577, 'epoch': 3.06} 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|██████████████████████▉ | 683/2230 [4:18:52<11:05:12, 25.80s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.553, 'learning_rate': 0.00026843930635838146, 'epoch': 3.07} 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 684/2230 [4:19:17<10:59:26, 25.59s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6246, 'learning_rate': 0.0002682658959537572, 'epoch': 3.07} 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6005, 'learning_rate': 0.00026809248554913296, 'epoch': 3.08} 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5905, 'learning_rate': 0.00026791907514450865, 'epoch': 3.08} 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████ | 685/2230 [4:19:42<10:54:23, 25.41s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5917, 'learning_rate': 0.00026774566473988435, 'epoch': 3.09} g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5212, 'learning_rate': 0.0002675722543352601, 'epoch': 3.09} g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5089, 'learning_rate': 0.00026739884393063584, 'epoch': 3.09} g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4883, 'learning_rate': 0.00026722543352601153, 'epoch': 3.1} g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5311, 'learning_rate': 0.0002670520231213873, 'epoch': 3.1} g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4886, 'learning_rate': 0.000266878612716763, 'epoch': 3.11} 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 693/2230 [4:22:58<10:21:47, 24.27s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4356, 'learning_rate': 0.0002665317919075144, 'epoch': 3.12} 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5206, 'learning_rate': 0.00026635838150289016, 'epoch': 3.12} 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 694/2230 [4:23:22<10:22:48, 24.33s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.409, 'learning_rate': 0.00026618497109826586, 'epoch': 3.13} g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.431, 'learning_rate': 0.00026601156069364155, 'epoch': 3.13} g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:33,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:33,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:38,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:38,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:38,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:38,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:38,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.448, 'learning_rate': 0.0002658381502890173, 'epoch': 3.13} [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:36:47,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:06,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:06,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:06,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:06,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:06,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:06,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4387, 'learning_rate': 0.00026566473988439305, 'epoch': 3.14} [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:18,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▉ | 701/2230 [4:26:04<9:47:13, 23.04s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▉ | 701/2230 [4:26:04<9:47:13, 23.04s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3993, 'learning_rate': 0.00026549132947976874, 'epoch': 3.14} 31%|███████████████████████▉ | 701/2230 [4:26:04<9:47:13, 23.04s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▉ | 701/2230 [4:26:04<9:47:13, 23.04s/it]g-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:44,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:44,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:49,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:49,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:53,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:53,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:57,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:57,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:37:57,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:01,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:01,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:05,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:05,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:09,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:09,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:09,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:09,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:17,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:17,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:17,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4112, 'learning_rate': 0.00026514450867052024, 'epoch': 3.15} [WARNING|modeling_utils.py:388] 2022-03-26 21:38:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:35,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:35,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:35,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:35,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.419, 'learning_rate': 0.00026497109826589593, 'epoch': 3.16} [WARNING|modeling_utils.py:388] 2022-03-26 21:38:35,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:35,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:38:48,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:38:48,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:38:48,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:38:48,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:56,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:56,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:56,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:38:56,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4739, 'learning_rate': 0.0002647976878612716, 'epoch': 3.16} [WARNING|modeling_utils.py:388] 2022-03-26 21:38:56,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:06,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:06,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:06,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:13,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:13,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:13,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:13,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:39:21,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:39:21,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4217, 'learning_rate': 0.00026462427745664737, 'epoch': 3.17} [WARNING|modeling_bart.py:1051] 2022-03-26 21:39:21,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:39:21,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:39:21,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:31,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:31,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:31,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:37,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:39,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:39,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:39,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:39,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:46,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:46,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:46,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:51,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:54,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:54,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:39:54,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:40:00,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:40:02,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:40:02,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4224, 'learning_rate': 0.0002642774566473988, 'epoch': 3.17} [WARNING|modeling_utils.py:388] 2022-03-26 21:40:02,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:40:02,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:09,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:12,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:14,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:16,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:18,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:21,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:21,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3968, 'learning_rate': 0.00026410404624277456, 'epoch': 3.18} [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:24,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:26,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:28,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:30,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:32,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:34,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:34,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:13:52,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▏ | 710/2230 [4:29:04<7:58:10, 18.88s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:39,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:43,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:45,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:47,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:49,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:50,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:50,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:37,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▏ | 711/2230 [4:29:20<7:35:29, 17.99s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:54,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:56,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:40:58,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:00,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:02,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:05,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:05,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:40:52,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 712/2230 [4:29:35<7:11:20, 17.05s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:09,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:11,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:12,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:15,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:17,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:18,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:18,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:07,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 713/2230 [4:29:49<6:50:44, 16.25s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:41:22,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:23,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:22,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:25,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:22,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:28,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:22,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:29,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:22,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:31,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:22,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 714/2230 [4:30:01<6:21:44, 15.11s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 714/2230 [4:30:01<6:21:44, 15.11s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:35,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:37,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:40,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:41,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:44,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:44,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:34,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 715/2230 [4:30:13<5:50:54, 13.90s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:41:45,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:47,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:45,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:49,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:45,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:51,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:45,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:53,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:45,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:53,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:45,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:56,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:55,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:41:58,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:55,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:00,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:55,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:02,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:55,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:02,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:41:55,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:04,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:03,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:06,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:03,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:08,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:03,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:10,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:03,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:10,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:03,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:12,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:11,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:14,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:11,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:16,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:11,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:16,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:11,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 719/2230 [4:30:45<3:53:11, 9.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:42:11,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 719/2230 [4:30:45<3:53:11, 9.26s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:22,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:22,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:26,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:26,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:29,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:29,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:33,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:37,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:37,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:40,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:40,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:44,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 720/2230 [4:31:14<6:20:54, 15.14s/it] Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 720/2230 [4:31:14<6:20:54, 15.14s/it] Setting `use_cache=False`...1] 2022-03-26 21:42:19,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 720/2230 [4:31:14<6:20:54, 15.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:51,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:51,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:54,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:54,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:42:58,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:01,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:01,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:05,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:05,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:08,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:08,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:12,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:42:47,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 721/2230 [4:31:42<7:58:40, 19.03s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 721/2230 [4:31:42<7:58:40, 19.03s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5798, 'learning_rate': 0.0002620231213872832, 'epoch': 3.23} [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:19,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:19,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:22,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:26,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:26,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:29,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:29,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:33,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:36,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:36,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:40,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:40,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:40,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:15,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 722/2230 [4:32:10<9:04:13, 21.65s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 722/2230 [4:32:10<9:04:13, 21.65s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:47,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:50,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:50,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:53,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:53,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:43:57,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:00,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:00,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2022, 'learning_rate': 0.00026167630057803465, 'epoch': 3.24} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0921, 'learning_rate': 0.0002615028901734104, 'epoch': 3.25} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.013, 'learning_rate': 0.0002613294797687861, 'epoch': 3.25} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9056, 'learning_rate': 0.00026115606936416184, 'epoch': 3.26} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8245, 'learning_rate': 0.0002609826589595376, 'epoch': 3.26} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7315, 'learning_rate': 0.0002608092485549133, 'epoch': 3.26} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6466, 'learning_rate': 0.000260635838150289, 'epoch': 3.27} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5553, 'learning_rate': 0.0002604624277456647, 'epoch': 3.27} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5448, 'learning_rate': 0.00026028901734104047, 'epoch': 3.28} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5247, 'learning_rate': 0.00026011560693641616, 'epoch': 3.28} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5081, 'learning_rate': 0.00025994219653179186, 'epoch': 3.29} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4833, 'learning_rate': 0.0002597687861271676, 'epoch': 3.29} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4507, 'learning_rate': 0.00025959537572254335, 'epoch': 3.3} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5001, 'learning_rate': 0.00025942196531791905, 'epoch': 3.3} [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:44:04,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4418, 'learning_rate': 0.0002592485549132948, 'epoch': 3.3} 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.418, 'learning_rate': 0.0002590751445086705, 'epoch': 3.31} 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▊ | 737/2230 [4:38:43<10:28:26, 25.26s/it] Setting `use_cache=False`...1] 2022-03-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4375, 'learning_rate': 0.00025890173410404624, 'epoch': 3.31} [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4326, 'learning_rate': 0.00025872832369942193, 'epoch': 3.32} [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3794, 'learning_rate': 0.0002585549132947977, 'epoch': 3.32} [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:50:54,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4532, 'learning_rate': 0.00025838150289017337, 'epoch': 3.33} 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3637, 'learning_rate': 0.0002582080924855491, 'epoch': 3.33} 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 742/2230 [4:40:46<10:08:19, 24.53s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3888, 'learning_rate': 0.0002580346820809248, 'epoch': 3.34} [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:09,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:53:28,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 745/2230 [4:41:58<9:58:15, 24.17s/it] Setting `use_cache=False`...e computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 745/2230 [4:41:58<9:58:15, 24.17s/it] Setting `use_cache=False`...e computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3813, 'learning_rate': 0.00025786127167630056, 'epoch': 3.34} 33%|█████████████████████████▍ | 745/2230 [4:41:58<9:58:15, 24.17s/it] Setting `use_cache=False`...e computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3921, 'learning_rate': 0.00025768786127167625, 'epoch': 3.35} [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:53:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.387, 'learning_rate': 0.000257514450867052, 'epoch': 3.35} 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4, 'learning_rate': 0.00025734104046242775, 'epoch': 3.35} 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 747/2230 [4:42:44<9:45:29, 23.69s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3746, 'learning_rate': 0.00025716763005780344, 'epoch': 3.36} [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:54:53,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3834, 'learning_rate': 0.0002568208092485549, 'epoch': 3.37} [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:41,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:57,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:57,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:57,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:55:57,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:05,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:05,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:09,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:09,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3827, 'learning_rate': 0.00025664739884393063, 'epoch': 3.37} [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:14,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:32,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:32,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:32,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:32,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:40,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:40,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:40,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:40,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:40,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:50,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:50,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:50,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4256, 'learning_rate': 0.00025630057803468207, 'epoch': 3.38} [WARNING|modeling_utils.py:388] 2022-03-26 21:56:50,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:50,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:50,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:56:50,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 755/2230 [4:45:40<8:48:51, 21.51s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 755/2230 [4:45:40<8:48:51, 21.51s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3919, 'learning_rate': 0.0002561271676300578, 'epoch': 3.39} 34%|█████████████████████████▋ | 755/2230 [4:45:40<8:48:51, 21.51s/it]g-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:19,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:19,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:19,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:25,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:25,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:25,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:25,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:43:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 756/2230 [4:46:01<8:39:26, 21.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 756/2230 [4:46:01<8:39:26, 21.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.329, 'learning_rate': 0.0002559537572254335, 'epoch': 3.39} 34%|█████████████████████████▊ | 756/2230 [4:46:01<8:39:26, 21.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 756/2230 [4:46:01<8:39:26, 21.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 756/2230 [4:46:01<8:39:26, 21.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 756/2230 [4:46:01<8:39:26, 21.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 756/2230 [4:46:01<8:39:26, 21.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:47,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:57:47,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:57:52,145 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 757/2230 [4:46:21<8:35:29, 21.00s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▊ | 757/2230 [4:46:21<8:35:29, 21.00s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.364, 'learning_rate': 0.0002557803468208092, 'epoch': 3.39} 34%|█████████████████████████▊ | 757/2230 [4:46:21<8:35:29, 21.00s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:00,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:00,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:04,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:04,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:08,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:08,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:08,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:08,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:08,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:16,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:16,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:20,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:22,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:22,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:26,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:26,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:30,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:30,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:32,806 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:32,806 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:36,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:39,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 21:58:39,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:42,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:44,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:47,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:49,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:49,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:51,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:53,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:55,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:57,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:58:59,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:01,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:03,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:05,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:05,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:07,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:09,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:11,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:13,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:15,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:16,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:18,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:18,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:20,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:22,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:24,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:26,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:28,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:31,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:33,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:35,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:35,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:37,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:38,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:41,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:43,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:45,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:48,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:48,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:49,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:51,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:53,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:55,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:58,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:58,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 21:59:59,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:02,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:04,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:05,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:08,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:08,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:10,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:12,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:14,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:16,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:18,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:18,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:20,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:22,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:24,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:24,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:26,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:28,370 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:30,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:31,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:31,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3134, 'learning_rate': 0.0002536994219653179, 'epoch': 3.45} [WARNING|modeling_utils.py:388] 2022-03-26 22:00:35,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:35,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:39,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:43,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:43,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:50,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:50,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:53,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:57,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:00:57,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:00,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:00,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.7246, 'learning_rate': 0.0002535260115606936, 'epoch': 3.45} [WARNING|modeling_utils.py:388] 2022-03-26 22:01:04,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:04,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:07,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:11,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:11,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:14,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:14,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:18,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:21,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:21,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:24,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:24,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:24,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:28,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:32,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:32,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:35,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:35,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:38,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:42,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:42,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:45,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:45,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:49,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:52,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:52,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:52,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:55,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:55,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:01:59,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:02,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:02,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:06,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:06,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:09,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:12,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:12,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8525, 'learning_rate': 0.0002530057803468208, 'epoch': 3.47} [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:02:16,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8248, 'learning_rate': 0.0002528323699421965, 'epoch': 3.47} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6959, 'learning_rate': 0.00025265895953757223, 'epoch': 3.48} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6066, 'learning_rate': 0.000252485549132948, 'epoch': 3.48} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5454, 'learning_rate': 0.0002523121387283237, 'epoch': 3.48} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5132, 'learning_rate': 0.00025213872832369937, 'epoch': 3.49} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4923, 'learning_rate': 0.0002519653179190751, 'epoch': 3.49} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4915, 'learning_rate': 0.00025179190751445086, 'epoch': 3.5} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4586, 'learning_rate': 0.00025161849710982656, 'epoch': 3.5} 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 774/2230 [4:51:17<9:48:20, 24.24s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4383, 'learning_rate': 0.0002514450867052023, 'epoch': 3.51} g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3949, 'learning_rate': 0.00025127167630057805, 'epoch': 3.51} g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▎ | 784/2230 [4:55:40<10:20:42, 25.76s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3954, 'learning_rate': 0.00025092485549132944, 'epoch': 3.52} [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4317, 'learning_rate': 0.0002507514450867052, 'epoch': 3.52} [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4151, 'learning_rate': 0.00025057803468208094, 'epoch': 3.53} [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:07:33,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.349, 'learning_rate': 0.00025040462427745663, 'epoch': 3.53} 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 788/2230 [4:57:20<10:09:15, 25.35s/it] Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3537, 'learning_rate': 0.0002502312138728323, 'epoch': 3.54} Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3305, 'learning_rate': 0.00025005780346820807, 'epoch': 3.54} Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3834, 'learning_rate': 0.00024988439306358376, 'epoch': 3.55} Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3788, 'learning_rate': 0.0002497109826589595, 'epoch': 3.55} Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3858, 'learning_rate': 0.00024953757225433526, 'epoch': 3.56} Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:10,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:10,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:10,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:10,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3739, 'learning_rate': 0.00024936416184971095, 'epoch': 3.56} 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3757, 'learning_rate': 0.00024919075144508665, 'epoch': 3.57} 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████ | 794/2230 [4:59:46<9:43:50, 24.39s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:53,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:53,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:53,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:53,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:53,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:11:53,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3295, 'learning_rate': 0.0002490173410404624, 'epoch': 3.57} 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 796/2230 [5:00:33<9:29:39, 23.84s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2999, 'learning_rate': 0.00024884393063583814, 'epoch': 3.57} 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 797/2230 [5:00:56<9:22:31, 23.55s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3492, 'learning_rate': 0.00024867052023121384, 'epoch': 3.58} g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3537, 'learning_rate': 0.0002484971098265896, 'epoch': 3.58} 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 799/2230 [5:01:41<9:09:11, 23.03s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:29,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:29,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3539, 'learning_rate': 0.00024832369942196533, 'epoch': 3.59} [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3874, 'learning_rate': 0.000248150289017341, 'epoch': 3.59} [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:13:34,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3529, 'learning_rate': 0.0002479768786127167, 'epoch': 3.6} [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:10,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:33,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:33,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:33,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:33,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:33,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 803/2230 [5:03:08<8:42:38, 21.98s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 803/2230 [5:03:08<8:42:38, 21.98s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 803/2230 [5:03:08<8:42:38, 21.98s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:47,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:47,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:47,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:47,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:47,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:57,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:14:57,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▍ | 804/2230 [5:03:29<8:34:04, 21.63s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▍ | 804/2230 [5:03:29<8:34:04, 21.63s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3855, 'learning_rate': 0.0002476300578034682, 'epoch': 3.61} 36%|███████████████████████████▍ | 804/2230 [5:03:29<8:34:04, 21.63s/it]g-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:08,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:08,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:08,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:08,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:08,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:08,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:08,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:22,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:22,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3391, 'learning_rate': 0.0002474566473988439, 'epoch': 3.61} [WARNING|modeling_utils.py:388] 2022-03-26 22:15:22,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:28,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:28,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:28,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:34,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:34,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:34,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:40,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:40,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:40,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3225, 'learning_rate': 0.0002472832369942196, 'epoch': 3.61} [WARNING|modeling_utils.py:388] 2022-03-26 22:15:40,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:40,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:40,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:15:53,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:15:53,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:57,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:15:57,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:01,595 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:01,595 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:01,595 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:05,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:05,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:05,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:11,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:13,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:13,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:17,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:17,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:17,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:21,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:24,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:24,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:28,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:28,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:28,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:33,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:36,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:36,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:36,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:39,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:42,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:44,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:46,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:16:46,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:50,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:52,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:54,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:54,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 21:57:34,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▌ | 810/2230 [5:05:23<7:22:31, 18.70s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:16:58,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:00,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:02,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:04,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:06,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:08,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:10,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:10,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:16:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▋ | 811/2230 [5:05:39<7:02:43, 17.87s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:14,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:16,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:17,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:19,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:21,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:25,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:25,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:12,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▋ | 812/2230 [5:05:54<6:40:32, 16.95s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:28,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:30,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:32,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:34,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:36,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:38,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:38,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:27,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▋ | 813/2230 [5:06:08<6:22:33, 16.20s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:43,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:44,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:46,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:49,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:50,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▋ | 814/2230 [5:06:21<5:55:18, 15.06s/it] Setting `use_cache=False`...1] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▋ | 814/2230 [5:06:21<5:55:18, 15.06s/it] Setting `use_cache=False`...1] 2022-03-26 22:17:41,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:55,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:53,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:56,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:53,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:17:59,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:53,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:00,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:53,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:03,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:53,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:03,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:17:53,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▊ | 815/2230 [5:06:32<5:24:57, 13.78s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:18:04,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:07,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:04,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:09,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:04,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:11,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:04,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 22:18:04,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-26 22:18:04,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:15,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:14,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:17,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:14,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:19,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:14,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:21,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:14,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:21,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:14,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:23,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:22,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:25,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:22,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:27,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:22,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 818/2230 [5:06:57<3:57:42, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:18:29,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 818/2230 [5:06:57<3:57:42, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:18:29,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:31,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:29,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:33,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:29,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:35,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:29,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:35,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:29,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 819/2230 [5:07:04<3:37:04, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 819/2230 [5:07:04<3:37:04, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:41,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:41,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:45,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:45,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:48,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:52,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:52,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:56,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:56,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:59,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:18:59,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:03,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 820/2230 [5:07:33<5:56:52, 15.19s/it] Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 820/2230 [5:07:33<5:56:52, 15.19s/it] Setting `use_cache=False`...1] 2022-03-26 22:18:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 820/2230 [5:07:33<5:56:52, 15.19s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:10,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:10,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:13,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:13,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:17,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:20,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:20,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:24,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:24,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:27,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:31,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:31,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:06,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 821/2230 [5:08:01<7:26:02, 18.99s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 821/2230 [5:08:01<7:26:02, 18.99s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0919, 'learning_rate': 0.0002446820809248555, 'epoch': 3.68} [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:38,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:38,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:41,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:45,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:45,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:48,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:48,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:52,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:55,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:55,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:19:59,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 822/2230 [5:08:29<8:28:49, 21.68s/it] Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 822/2230 [5:08:29<8:28:49, 21.68s/it] Setting `use_cache=False`...1] 2022-03-26 22:19:34,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 822/2230 [5:08:29<8:28:49, 21.68s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:06,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:06,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:09,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:09,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:12,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:12,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:16,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:19,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:19,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7192, 'learning_rate': 0.0002443352601156069, 'epoch': 3.69} [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7089, 'learning_rate': 0.00024416184971098263, 'epoch': 3.7} [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5815, 'learning_rate': 0.00024398843930635838, 'epoch': 3.7} [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5092, 'learning_rate': 0.00024381502890173407, 'epoch': 3.7} [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5179, 'learning_rate': 0.0002436416184971098, 'epoch': 3.71} [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4872, 'learning_rate': 0.00024346820809248554, 'epoch': 3.71} [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4405, 'learning_rate': 0.00024329479768786126, 'epoch': 3.72} [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:20:23,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4115, 'learning_rate': 0.00024312138728323698, 'epoch': 3.72} 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4027, 'learning_rate': 0.00024294797687861267, 'epoch': 3.73} 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3596, 'learning_rate': 0.00024277456647398842, 'epoch': 3.73} 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3759, 'learning_rate': 0.00024260115606936414, 'epoch': 3.74} 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▉ | 830/2230 [5:12:05<10:13:39, 26.30s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3, 'learning_rate': 0.00024225433526011558, 'epoch': 3.74} 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████ | 834/2230 [5:13:49<10:05:03, 26.01s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3155, 'learning_rate': 0.00024208092485549133, 'epoch': 3.75} [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3328, 'learning_rate': 0.00024190751445086702, 'epoch': 3.75} [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:26:11,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3285, 'learning_rate': 0.00024173410404624275, 'epoch': 3.76} [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3116, 'learning_rate': 0.00024156069364161847, 'epoch': 3.76} [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4866, 'learning_rate': 0.0002413872832369942, 'epoch': 3.77} [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3259, 'learning_rate': 0.00024121387283236993, 'epoch': 3.77} [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3017, 'learning_rate': 0.00024104046242774563, 'epoch': 3.78} [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3129, 'learning_rate': 0.00024086705202312135, 'epoch': 3.78} [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:27:08,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.328, 'learning_rate': 0.0002406936416184971, 'epoch': 3.78} 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 844/2230 [5:17:58<9:28:22, 24.61s/it] Setting `use_cache=False`...1] 2022-03-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3282, 'learning_rate': 0.00024052023121387282, 'epoch': 3.79} [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:29:50,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 846/2230 [5:18:45<9:16:27, 24.12s/it]g-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 846/2230 [5:18:45<9:16:27, 24.12s/it]g-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2992, 'learning_rate': 0.00024034682080924854, 'epoch': 3.79} [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2855, 'learning_rate': 0.00024017341040462423, 'epoch': 3.8} [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3201, 'learning_rate': 0.00023999999999999998, 'epoch': 3.8} [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.264, 'learning_rate': 0.0002398265895953757, 'epoch': 3.81} [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3239, 'learning_rate': 0.00023965317919075142, 'epoch': 3.81} [WARNING|modeling_utils.py:388] 2022-03-26 22:30:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:31:57,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2577, 'learning_rate': 0.00023947976878612714, 'epoch': 3.82} [WARNING|modeling_utils.py:388] 2022-03-26 22:32:17,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:17,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:21,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:21,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:25,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:25,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:29,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:29,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:33,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:33,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:33,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:38,066 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:38,066 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:42,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:42,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:42,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:42,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:42,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:42,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2993, 'learning_rate': 0.0002391329479768786, 'epoch': 3.83} [WARNING|modeling_utils.py:388] 2022-03-26 22:32:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:32:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:08,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:08,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:08,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:08,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:08,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:08,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:08,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2835, 'learning_rate': 0.0002389595375722543, 'epoch': 3.83} [WARNING|modeling_utils.py:388] 2022-03-26 22:33:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:33:31,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:33:31,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:33:31,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:37,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:37,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3059, 'learning_rate': 0.00023878612716763002, 'epoch': 3.83} [WARNING|modeling_utils.py:388] 2022-03-26 22:33:37,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:37,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:37,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:47,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:47,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:47,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:53,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:53,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:53,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:33:53,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2899, 'learning_rate': 0.00023861271676300577, 'epoch': 3.84} [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:02,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:02,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:02,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:02,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:02,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:12,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:12,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:15,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:15,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:15,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:20:02,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 857/2230 [5:22:47<8:01:29, 21.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 857/2230 [5:22:47<8:01:29, 21.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:24,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:24,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:24,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:30,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:30,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:34,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:34,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:38,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:38,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2783, 'learning_rate': 0.0002382658959537572, 'epoch': 3.85} [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:42,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:42,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:46,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:48,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:48,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:52,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:34:55,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 859/2230 [5:23:24<7:32:34, 19.81s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 859/2230 [5:23:24<7:32:34, 19.81s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:34:59,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:35:01,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:35:03,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:35:03,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:07,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:09,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:11,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:13,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:13,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:15,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:17,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:19,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:21,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:23,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:25,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:27,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:29,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:29,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:31,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:33,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:35,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:37,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:38,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:40,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:42,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:46,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:46,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:47,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:49,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:51,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:53,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:55,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:57,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:58,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:35:58,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:02,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:03,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:05,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:06,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:09,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:09,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:11,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:14,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:15,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:18,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:19,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:19,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:22,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:23,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:25,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:28,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:30,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:30,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:32,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:33,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:35,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:37,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:37,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:39,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:42,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:44,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:46,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:46,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:48,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:50,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:51,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:54,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:54,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4326, 'learning_rate': 0.00023635838150289017, 'epoch': 3.9} [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:57,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:36:57,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:01,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:04,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:04,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:08,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:08,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:11,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:11,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:15,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:18,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:18,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:22,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:22,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5025, 'learning_rate': 0.00023618497109826586, 'epoch': 3.9} [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:25,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:29,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:29,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:32,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:32,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:36,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:39,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:39,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:43,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:43,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:46,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:46,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:46,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:49,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:53,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:53,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:37:56,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:00,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:00,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:03,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:03,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:06,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:10,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:10,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:13,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:13,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:16,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:16,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:20,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:20,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:23,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:23,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:27,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:30,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:30,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:33,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:33,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:36,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:40,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:40,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5277, 'learning_rate': 0.00023566473988439305, 'epoch': 3.91} [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4715, 'learning_rate': 0.00023549132947976877, 'epoch': 3.92} [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4018, 'learning_rate': 0.0002353179190751445, 'epoch': 3.92} [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4456, 'learning_rate': 0.00023514450867052024, 'epoch': 3.93} [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3791, 'learning_rate': 0.00023497109826589593, 'epoch': 3.93} [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4253, 'learning_rate': 0.00023479768786127165, 'epoch': 3.94} [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:38:43,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3124, 'learning_rate': 0.00023462427745664737, 'epoch': 3.94} 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3515, 'learning_rate': 0.00023445086705202312, 'epoch': 3.95} 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▉ | 879/2230 [5:29:46<9:22:49, 25.00s/it] Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2906, 'learning_rate': 0.00023410404624277454, 'epoch': 3.96} [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:42:09,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3742, 'learning_rate': 0.00023393063583815026, 'epoch': 3.96} [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:42:40,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:07,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:07,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:07,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:07,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:15,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:15,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2932, 'learning_rate': 0.000233757225433526, 'epoch': 3.96} [WARNING|modeling_utils.py:388] 2022-03-26 22:43:15,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:15,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:15,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:15,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:28,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:28,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:32,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:32,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:32,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:32,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:32,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:39,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:39,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:39,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:39,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:39,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:50,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:50,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:50,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:56,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:56,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:43:56,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:00,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:00,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:04,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:04,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:08,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:08,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:08,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:14,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:17,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:44:17,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2559, 'learning_rate': 0.00023323699421965314, 'epoch': 3.98} [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:21,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:23,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:25,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:25,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:29,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:31,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:33,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:33,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:34:20,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 888/2230 [5:33:03<7:33:25, 20.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:37,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:39,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:41,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:43,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:44,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:48,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:48,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:35,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 889/2230 [5:33:17<6:53:10, 18.49s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:44:49,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:51,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:49,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:52,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:49,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:55,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:49,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:57,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:49,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:59,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:49,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:44:59,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:44:49,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 890/2230 [5:33:28<6:06:29, 16.41s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:45:01,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:03,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:01,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:04,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:01,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:07,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:01,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 891/2230 [5:33:37<5:17:00, 14.20s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:01,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 891/2230 [5:33:37<5:17:00, 14.20s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:01,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:12,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:10,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:13,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:10,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:16,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:10,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:16,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:10,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 892/2230 [5:33:44<4:26:36, 11.96s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 892/2230 [5:33:44<4:26:36, 11.96s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:21,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:21,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:25,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:25,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:29,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:32,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:32,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:36,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:36,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:39,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:39,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:43,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 893/2230 [5:34:13<6:22:09, 17.15s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 893/2230 [5:34:13<6:22:09, 17.15s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:17,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 893/2230 [5:34:13<6:22:09, 17.15s/it][WARNING|modeling_bart.py:1051] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:50,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:50,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:54,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:54,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:45:57,729 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:01,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:01,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:04,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:04,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5956, 'learning_rate': 0.0002320231213872832, 'epoch': 4.01} [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4446, 'learning_rate': 0.00023184971098265893, 'epoch': 4.01} [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:46:09,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3546, 'learning_rate': 0.00023167630057803465, 'epoch': 4.02} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3401, 'learning_rate': 0.0002315028901734104, 'epoch': 4.02} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3294, 'learning_rate': 0.00023132947976878612, 'epoch': 4.03} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3047, 'learning_rate': 0.00023115606936416181, 'epoch': 4.03} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2599, 'learning_rate': 0.00023098265895953754, 'epoch': 4.04} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2687, 'learning_rate': 0.00023080924855491328, 'epoch': 4.04} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2312, 'learning_rate': 0.000230635838150289, 'epoch': 4.04} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2279, 'learning_rate': 0.00023046242774566472, 'epoch': 4.05} 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▌ | 896/2230 [5:35:38<9:00:44, 24.32s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2295, 'learning_rate': 0.00023028901734104042, 'epoch': 4.05} 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1895, 'learning_rate': 0.00023011560693641617, 'epoch': 4.06} 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.236, 'learning_rate': 0.0002299421965317919, 'epoch': 4.06} 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2184, 'learning_rate': 0.0002297687861271676, 'epoch': 4.07} 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1909, 'learning_rate': 0.00022959537572254333, 'epoch': 4.07} 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▊ | 904/2230 [5:39:13<9:44:24, 26.44s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1998, 'learning_rate': 0.00022942196531791908, 'epoch': 4.08} 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1957, 'learning_rate': 0.00022924855491329477, 'epoch': 4.08} 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1838, 'learning_rate': 0.0002290751445086705, 'epoch': 4.09} 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|██████████████████████████████▉ | 909/2230 [5:41:22<9:29:42, 25.88s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1752, 'learning_rate': 0.0002289017341040462, 'epoch': 4.09} 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.202, 'learning_rate': 0.00022872832369942196, 'epoch': 4.09} 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1954, 'learning_rate': 0.00022855491329479768, 'epoch': 4.1} 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1885, 'learning_rate': 0.00022838150289017337, 'epoch': 4.1} 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1631, 'learning_rate': 0.0002282080924855491, 'epoch': 4.11} 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1711, 'learning_rate': 0.00022803468208092484, 'epoch': 4.11} 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 912/2230 [5:42:38<9:17:40, 25.39s/it] Setting `use_cache=False`...1] 2022-03-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:23,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:23,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:23,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:23,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:23,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1738, 'learning_rate': 0.00022786127167630056, 'epoch': 4.12} [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1571, 'learning_rate': 0.00022768786127167628, 'epoch': 4.12} [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1176, 'learning_rate': 0.00022751445086705198, 'epoch': 4.13} [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1831, 'learning_rate': 0.00022734104046242772, 'epoch': 4.13} [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:56:34,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1878, 'learning_rate': 0.00022716763005780344, 'epoch': 4.13} [WARNING|modeling_utils.py:388] 2022-03-26 22:58:15,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:15,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:20,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:58:36,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:58:36,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 22:58:36,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:42,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:42,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:47,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:47,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:47,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:47,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:58:47,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2019, 'learning_rate': 0.00022682080924855489, 'epoch': 4.14} 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1753, 'learning_rate': 0.00022664739884393063, 'epoch': 4.15} 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 924/2230 [5:47:24<8:22:19, 23.08s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:29,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:29,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:29,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:29,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:29,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:29,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:29,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 926/2230 [5:48:09<8:12:45, 22.67s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 926/2230 [5:48:09<8:12:45, 22.67s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 926/2230 [5:48:09<8:12:45, 22.67s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 926/2230 [5:48:09<8:12:45, 22.67s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1571, 'learning_rate': 0.00022630057803468205, 'epoch': 4.16} [WARNING|modeling_utils.py:388] 2022-03-26 22:59:49,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▋ | 928/2230 [5:48:51<7:54:26, 21.86s/it]g-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:26,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:26,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:26,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:26,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:26,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:36,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:36,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:36,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:43,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:43,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1784, 'learning_rate': 0.00022595375722543352, 'epoch': 4.17} [WARNING|modeling_utils.py:388] 2022-03-26 23:00:46,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:46,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:46,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:53,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:53,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:53,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:00:59,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:01,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:01,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:01,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1372, 'learning_rate': 0.00022578034682080924, 'epoch': 4.17} [WARNING|modeling_utils.py:388] 2022-03-26 23:01:01,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:09,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:09,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:13,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:13,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:18,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:18,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:22,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:22,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:22,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1602, 'learning_rate': 0.00022560693641618496, 'epoch': 4.17} [WARNING|modeling_utils.py:388] 2022-03-26 23:01:22,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:22,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:22,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:34,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:36,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:38,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:38,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:38,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:38,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:44,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:44,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:48,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:50,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:52,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:54,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:01:54,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:01:58,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 22:45:47,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▊ | 933/2230 [5:50:27<6:54:46, 19.19s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▊ | 933/2230 [5:50:27<6:54:46, 19.19s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:02,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:04,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:06,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:08,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:10,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:12,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:14,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:14,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:00,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▊ | 934/2230 [5:50:43<6:33:12, 18.20s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:16,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:18,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:16,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:20,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:16,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:21,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:16,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:23,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:16,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:25,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:16,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:29,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:16,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▊ | 935/2230 [5:50:58<6:11:01, 17.19s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▊ | 935/2230 [5:50:58<6:11:01, 17.19s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:32,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:34,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:36,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:38,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:39,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:43,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:31,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 936/2230 [5:51:12<5:49:29, 16.21s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 936/2230 [5:51:12<5:49:29, 16.21s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:46,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:48,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:51,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:52,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:55,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 937/2230 [5:51:24<5:25:12, 15.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:57,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 937/2230 [5:51:24<5:25:12, 15.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:02:57,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:02:58,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:57,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:01,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:57,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:03,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:57,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:05,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:57,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:07,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:02:57,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 938/2230 [5:51:36<5:03:17, 14.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:03:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 938/2230 [5:51:36<5:03:17, 14.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:03:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:14,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:15,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:17,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:17,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:19,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:18,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:22,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:18,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:23,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:18,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:25,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:18,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:25,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:18,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████ | 940/2230 [5:51:54<4:07:10, 11.50s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:03:27,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:29,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:27,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:31,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:27,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:33,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:27,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:33,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:27,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:35,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:34,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:37,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:34,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:40,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:34,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:40,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:34,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████ | 942/2230 [5:52:08<3:15:01, 9.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████ | 942/2230 [5:52:08<3:15:01, 9.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:45,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:45,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:49,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:49,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:52,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:56,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:03:56,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:00,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:00,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:03,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:03,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:07,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:07,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:03:41,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 943/2230 [5:52:37<5:25:22, 15.17s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 943/2230 [5:52:37<5:25:22, 15.17s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:14,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:14,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:18,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:18,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:21,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:25,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:25,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:29,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:29,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:33,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:33,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:37,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 944/2230 [5:53:07<6:58:13, 19.51s/it] Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 944/2230 [5:53:07<6:58:13, 19.51s/it] Setting `use_cache=False`...1] 2022-03-26 23:04:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 944/2230 [5:53:07<6:58:13, 19.51s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 944/2230 [5:53:07<6:58:13, 19.51s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:44,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:47,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:47,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:51,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:51,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:54,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:58,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:04:58,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:01,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:01,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:05,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 945/2230 [5:53:35<7:53:53, 22.13s/it] Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 945/2230 [5:53:35<7:53:53, 22.13s/it] Setting `use_cache=False`...1] 2022-03-26 23:04:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 945/2230 [5:53:35<7:53:53, 22.13s/it][WARNING|modeling_bart.py:1051] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:12,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:12,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:15,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:15,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:19,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:22,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:22,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3612, 'learning_rate': 0.0002230057803468208, 'epoch': 4.24} [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.37, 'learning_rate': 0.00022283236994219652, 'epoch': 4.25} [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2676, 'learning_rate': 0.00022265895953757224, 'epoch': 4.25} [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2842, 'learning_rate': 0.00022248554913294798, 'epoch': 4.26} [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3132, 'learning_rate': 0.00022231213872832368, 'epoch': 4.26} [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:05:26,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2282, 'learning_rate': 0.0002221387283236994, 'epoch': 4.26} 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 951/2230 [5:56:20<9:30:00, 26.74s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.238, 'learning_rate': 0.00022196531791907512, 'epoch': 4.27} 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2261, 'learning_rate': 0.00022179190751445087, 'epoch': 4.27} 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2274, 'learning_rate': 0.0002216184971098266, 'epoch': 4.28} 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2562, 'learning_rate': 0.00022144508670520228, 'epoch': 4.28} 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1999, 'learning_rate': 0.000221271676300578, 'epoch': 4.29} 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1859, 'learning_rate': 0.00022109826589595375, 'epoch': 4.29} 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1847, 'learning_rate': 0.00022092485549132947, 'epoch': 4.3} 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▍ | 952/2230 [5:56:47<9:29:49, 26.75s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2134, 'learning_rate': 0.0002207514450867052, 'epoch': 4.3} 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.188, 'learning_rate': 0.00022057803468208088, 'epoch': 4.3} 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2077, 'learning_rate': 0.00022040462427745663, 'epoch': 4.31} 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 959/2230 [5:59:49<9:10:16, 25.98s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1951, 'learning_rate': 0.00022023121387283235, 'epoch': 4.31} 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 962/2230 [6:01:05<8:57:30, 25.43s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1824, 'learning_rate': 0.00022005780346820807, 'epoch': 4.32} 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 963/2230 [6:01:31<9:00:28, 25.59s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1861, 'learning_rate': 0.0002198843930635838, 'epoch': 4.32} 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▊ | 964/2230 [6:01:56<8:54:42, 25.34s/it] Setting `use_cache=False`...1] 2022-03-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1701, 'learning_rate': 0.00021971098265895954, 'epoch': 4.33} [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1884, 'learning_rate': 0.00021953757225433524, 'epoch': 4.33} [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.152, 'learning_rate': 0.00021936416184971096, 'epoch': 4.34} [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1824, 'learning_rate': 0.00021919075144508668, 'epoch': 4.34} [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:13:51,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1769, 'learning_rate': 0.00021901734104046243, 'epoch': 4.35} g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1704, 'learning_rate': 0.00021884393063583815, 'epoch': 4.35} [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:15:48,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████ | 971/2230 [6:04:44<8:23:14, 23.98s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████ | 971/2230 [6:04:44<8:23:14, 23.98s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1562, 'learning_rate': 0.00021867052023121384, 'epoch': 4.35} 44%|█████████████████████████████████ | 971/2230 [6:04:44<8:23:14, 23.98s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████ | 971/2230 [6:04:44<8:23:14, 23.98s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████ | 971/2230 [6:04:44<8:23:14, 23.98s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████ | 971/2230 [6:04:44<8:23:14, 23.98s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1738, 'learning_rate': 0.00021849710982658956, 'epoch': 4.36} [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:16:29,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1492, 'learning_rate': 0.0002183236994219653, 'epoch': 4.36} [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:16:55,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1489, 'learning_rate': 0.00021815028901734103, 'epoch': 4.37} 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1694, 'learning_rate': 0.00021797687861271675, 'epoch': 4.37} 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 974/2230 [6:05:53<8:07:52, 23.31s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1627, 'learning_rate': 0.00021780346820809247, 'epoch': 4.38} [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:05,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:21,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:21,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:21,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:21,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:21,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:31,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:31,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1595, 'learning_rate': 0.00021763005780346822, 'epoch': 4.38} [WARNING|modeling_utils.py:388] 2022-03-26 23:18:31,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:31,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:31,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:42,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:42,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:46,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:46,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:46,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:52,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:52,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.165, 'learning_rate': 0.0002174566473988439, 'epoch': 4.39} [WARNING|modeling_utils.py:388] 2022-03-26 23:18:56,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:56,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:56,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:56,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:18:56,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:07,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:07,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:07,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:13,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:13,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1581, 'learning_rate': 0.00021728323699421963, 'epoch': 4.39} [WARNING|modeling_utils.py:388] 2022-03-26 23:19:13,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:13,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:13,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:23,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:23,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:23,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:23,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:23,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:33,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:33,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:35,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:35,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:19:40,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:19:40,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:19:40,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:19:40,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:48,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:48,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:48,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:48,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:54,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:54,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:19:54,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:00,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:02,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:02,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:06,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:08,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:08,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:08,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▍ | 982/2230 [6:08:40<6:59:30, 20.17s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:14,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:16,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:18,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:20,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:20:20,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:24,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:26,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:29,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:29,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:31,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:33,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:35,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:37,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:39,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:41,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:43,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:45,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:45,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:47,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:49,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:51,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:53,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:54,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:56,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:58,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:20:58,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:00,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:03,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:05,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:07,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:09,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:12,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:13,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:13,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:15,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:17,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:20,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:23,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:26,257 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:26,257 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:27,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:30,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:31,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:33,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:36,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:37,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:37,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:40,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:42,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:44,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:46,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:48,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:48,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:50,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:52,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:54,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:56,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:56,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:21:58,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:00,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:02,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:04,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:04,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:06,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:08,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:09,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:09,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:12,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:12,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:20,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:20,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:27,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:27,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:30,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:30,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:34,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:34,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:38,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:38,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9389, 'learning_rate': 0.00021485549132947972, 'epoch': 4.45} [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:41,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:45,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:45,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:48,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:48,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:52,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:55,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:55,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:59,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:22:59,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:03,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:03,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:07,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:07,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.619, 'learning_rate': 0.00021468208092485547, 'epoch': 4.46} [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:11,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:11,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:14,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:18,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:18,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:25,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:25,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:28,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:32,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:32,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:32,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:35,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:35,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:39,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:42,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:42,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:46,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:46,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:49,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:49,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:52,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3153, 'learning_rate': 0.0002143352601156069, 'epoch': 4.47} [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:23:56,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.38, 'learning_rate': 0.00021416184971098263, 'epoch': 4.47} 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 997/2230 [6:12:59<8:29:21, 24.79s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2856, 'learning_rate': 0.00021398843930635838, 'epoch': 4.48} 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3066, 'learning_rate': 0.0002138150289017341, 'epoch': 4.48} 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████ | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 23:35:24 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.3967271149158478, 'eval_wer': 0.12705275684252282, 'eval_runtime': 570.6501, 'eval_samples_per_second': 4.63, 'eval_steps_per_second': 0.58, 'epoch': 4.48} [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2367, 'learning_rate': 0.00021346820809248551, 'epoch': 4.49} [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 23:25:53,421 >> Num examples = 2642 | 998/2230 [6:13:26<8:43:50, 25.51s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.219, 'learning_rate': 0.00021329479768786126, 'epoch': 4.49} 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1921, 'learning_rate': 0.00021312138728323698, 'epoch': 4.5} 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1953, 'learning_rate': 0.0002129479768786127, 'epoch': 4.5} 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2085, 'learning_rate': 0.0002127745664739884, 'epoch': 4.51} 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|████████████████████████████████▊ | 1002/2230 [6:26:51<58:59:50, 172.96s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1831, 'learning_rate': 0.00021260115606936414, 'epoch': 4.51} 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1006/2230 [6:28:39<21:05:19, 62.03s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2097, 'learning_rate': 0.00021242774566473987, 'epoch': 4.52} 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1797, 'learning_rate': 0.00021225433526011559, 'epoch': 4.52} 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▍ | 1007/2230 [6:29:06<17:28:24, 51.43s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1994, 'learning_rate': 0.0002120809248554913, 'epoch': 4.52} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.173, 'learning_rate': 0.00021190751445086705, 'epoch': 4.53} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1891, 'learning_rate': 0.00021173410404624275, 'epoch': 4.53} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1908, 'learning_rate': 0.00021156069364161847, 'epoch': 4.54} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2069, 'learning_rate': 0.0002113872832369942, 'epoch': 4.54} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1566, 'learning_rate': 0.00021121387283236994, 'epoch': 4.55} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1691, 'learning_rate': 0.00021104046242774566, 'epoch': 4.55} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1543, 'learning_rate': 0.00021086705202312135, 'epoch': 4.56} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.162, 'learning_rate': 0.00021069364161849707, 'epoch': 4.56} [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:41:34,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1871, 'learning_rate': 0.00021052023121387282, 'epoch': 4.57} [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:45:13,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.179, 'learning_rate': 0.00021034682080924854, 'epoch': 4.57} 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1806, 'learning_rate': 0.00021017341040462426, 'epoch': 4.57} 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1621, 'learning_rate': 0.00020999999999999998, 'epoch': 4.58} 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1616, 'learning_rate': 0.00020982658959537573, 'epoch': 4.58} 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1854, 'learning_rate': 0.00020965317919075142, 'epoch': 4.59} 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1454, 'learning_rate': 0.00020947976878612714, 'epoch': 4.59} 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2193, 'learning_rate': 0.00020930635838150286, 'epoch': 4.6} 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▎ | 1019/2230 [6:34:06<8:22:29, 24.90s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1518, 'learning_rate': 0.0002091329479768786, 'epoch': 4.6} [WARNING|modeling_utils.py:388] 2022-03-26 23:48:10,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:26,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:26,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:26,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:26,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:26,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:26,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:26,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:40,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:40,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1593, 'learning_rate': 0.00020895953757225433, 'epoch': 4.61} [WARNING|modeling_utils.py:388] 2022-03-26 23:48:44,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:44,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:48,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:48,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:48,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:48,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:56,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:56,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:56,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:56,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:48:56,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1579, 'learning_rate': 0.00020878612716763003, 'epoch': 4.61} [WARNING|modeling_utils.py:388] 2022-03-26 23:49:07,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:07,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:11,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:11,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:11,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:11,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:11,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:21,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:21,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:49:21,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1378, 'learning_rate': 0.00020861271676300575, 'epoch': 4.61} [WARNING|modeling_utils.py:388] 2022-03-26 23:49:21,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:29,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:29,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:29,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:35,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:35,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:35,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:42,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:42,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.11, 'learning_rate': 0.0002084393063583815, 'epoch': 4.62} [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:42,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:48,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:48,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:48,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:54,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:54,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:49:54,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:00,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 1031/2230 [6:38:30<6:52:52, 20.66s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 1031/2230 [6:38:30<6:52:52, 20.66s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:04,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:04,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:04,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:10,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:10,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:10,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:16,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:16,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:20,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:20,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1464, 'learning_rate': 0.00020809248554913294, 'epoch': 4.63} [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:24,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:27,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:27,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:30,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:33,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:35,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 23:50:35,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:39,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:39,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:41,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:43,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:45,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:47,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:49,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:51,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:53,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:55,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:55,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:57,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:50:59,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:01,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:03,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:05,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:07,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:09,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:09,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:11,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:13,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:15,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:17,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:20,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:22,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:24,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:24,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:25,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:27,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:30,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:32,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:33,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:37,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:38,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:38,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:40,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:41,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:44,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:46,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:48,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:50,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:50,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:52,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:55,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:56,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:51:58,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:01,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:01,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:03,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:04,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:06,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:08,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:08,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:10,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:12,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:14,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:16,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:16,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:19,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:20,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:23,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:23,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:23,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:27,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:27,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:34,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:34,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:38,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:41,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:41,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:45,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:45,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:49,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:49,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:52,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:52,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5592, 'learning_rate': 0.0002061849710982659, 'epoch': 4.68} [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:56,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:59,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:52:59,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:03,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:03,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:07,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:10,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:10,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:14,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:14,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:18,665 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:18,665 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:22,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:22,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4164, 'learning_rate': 0.00020601156069364158, 'epoch': 4.68} [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:25,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:29,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:29,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:32,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:32,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:36,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:39,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:39,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:43,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:43,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:46,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:46,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:46,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:50,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:53,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:53,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:57,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:53:57,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:00,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3236, 'learning_rate': 0.00020566473988439305, 'epoch': 4.69} [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2769, 'learning_rate': 0.00020549132947976877, 'epoch': 4.7} [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 23:54:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.317, 'learning_rate': 0.0002053179190751445, 'epoch': 4.7} 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2551, 'learning_rate': 0.00020514450867052021, 'epoch': 4.7} 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.246, 'learning_rate': 0.00020497109826589596, 'epoch': 4.71} 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▏ | 1048/2230 [6:43:41<8:24:39, 25.62s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2346, 'learning_rate': 0.00020479768786127166, 'epoch': 4.71} 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2146, 'learning_rate': 0.00020462427745664738, 'epoch': 4.72} 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▎ | 1051/2230 [6:45:03<8:45:03, 26.72s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1969, 'learning_rate': 0.0002044508670520231, 'epoch': 4.72} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.243, 'learning_rate': 0.00020427745664739885, 'epoch': 4.73} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1915, 'learning_rate': 0.00020410404624277457, 'epoch': 4.73} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1672, 'learning_rate': 0.00020393063583815026, 'epoch': 4.74} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.213, 'learning_rate': 0.00020375722543352598, 'epoch': 4.74} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.178, 'learning_rate': 0.00020358381502890173, 'epoch': 4.74} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1977, 'learning_rate': 0.00020341040462427745, 'epoch': 4.75} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1956, 'learning_rate': 0.00020323699421965317, 'epoch': 4.75} 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 1053/2230 [6:45:56<8:42:39, 26.64s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1645, 'learning_rate': 0.00020306358381502886, 'epoch': 4.76} 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1061/2230 [6:49:23<8:18:49, 25.60s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.19, 'learning_rate': 0.0002028901734104046, 'epoch': 4.76} 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1341, 'learning_rate': 0.00020271676300578033, 'epoch': 4.77} 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1533, 'learning_rate': 0.00020254335260115605, 'epoch': 4.77} 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1639, 'learning_rate': 0.00020236994219653177, 'epoch': 4.78} 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1612, 'learning_rate': 0.00020219653179190752, 'epoch': 4.78} 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▋ | 1062/2230 [6:49:48<8:15:17, 25.44s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1454, 'learning_rate': 0.00020184971098265893, 'epoch': 4.79} 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1465, 'learning_rate': 0.00020167630057803466, 'epoch': 4.79} 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1555, 'learning_rate': 0.0002015028901734104, 'epoch': 4.8} 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|███████████████████████████████████▉ | 1067/2230 [6:51:52<7:58:28, 24.69s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1481, 'learning_rate': 0.00020132947976878612, 'epoch': 4.8} [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:04:45,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:05:10,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:05:10,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:05:10,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:05:10,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:05:10,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:05:10,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:05:10,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1072/2230 [6:53:51<7:37:55, 23.73s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1072/2230 [6:53:51<7:37:55, 23.73s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1376, 'learning_rate': 0.00020115606936416184, 'epoch': 4.81} 48%|████████████████████████████████████ | 1072/2230 [6:53:51<7:37:55, 23.73s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1072/2230 [6:53:51<7:37:55, 23.73s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1072/2230 [6:53:51<7:37:55, 23.73s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1072/2230 [6:53:51<7:37:55, 23.73s/it] Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:05:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:05:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:05:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:05:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:05:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1617, 'learning_rate': 0.00020098265895953754, 'epoch': 4.81} 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1073/2230 [6:54:14<7:32:50, 23.48s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1281, 'learning_rate': 0.00020080924855491329, 'epoch': 4.82} 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████ | 1074/2230 [6:54:37<7:27:43, 23.24s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1483, 'learning_rate': 0.000200635838150289, 'epoch': 4.82} [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:29,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:50,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:50,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:54,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:54,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1589, 'learning_rate': 0.00020046242774566473, 'epoch': 4.83} [WARNING|modeling_utils.py:388] 2022-03-27 00:06:58,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:58,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:58,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:58,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:06:58,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:08,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:08,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:08,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:08,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:08,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1597, 'learning_rate': 0.00020028901734104045, 'epoch': 4.83} [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:18,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:18,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:18,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:18,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1448, 'learning_rate': 0.0002001156069364162, 'epoch': 4.83} [WARNING|modeling_utils.py:388] 2022-03-27 00:07:27,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:07:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:51,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:51,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:51,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:51,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:51,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1361, 'learning_rate': 0.0001999421965317919, 'epoch': 4.84} [WARNING|modeling_utils.py:388] 2022-03-27 00:07:51,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:07:51,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:05,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:05,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:05,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:05,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:14,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:14,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:18,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:18,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1457, 'learning_rate': 0.0001997687861271676, 'epoch': 4.84} [WARNING|modeling_utils.py:388] 2022-03-27 00:08:18,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:24,370 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:24,370 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:24,370 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:30,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:32,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:32,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:32,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▎ | 1081/2230 [6:57:04<6:33:11, 20.53s/it]g-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:38,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:38,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:38,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:44,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:47,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:47,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:50,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:52,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:52,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:08:52,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:56,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:59,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:08:59,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:02,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:05,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:07,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:09,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:11,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 23:05:09,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▍ | 1083/2230 [6:57:41<6:09:54, 19.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▍ | 1083/2230 [6:57:41<6:09:54, 19.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:15,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:17,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:19,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:21,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:23,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:25,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:27,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:27,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:13,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▍ | 1084/2230 [6:57:57<5:51:16, 18.39s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:31,722 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:33,602 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:35,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:37,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:39,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:41,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▍ | 1085/2230 [6:58:12<5:31:16, 17.36s/it] Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▍ | 1085/2230 [6:58:12<5:31:16, 17.36s/it] Setting `use_cache=False`...1] 2022-03-27 00:09:29,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:46,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:44,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:48,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:44,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:49,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:44,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:51,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:44,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:53,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:44,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:09:56,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:44,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▌ | 1086/2230 [6:58:25<5:10:03, 16.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:09:58,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▌ | 1086/2230 [6:58:25<5:10:03, 16.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:09:58,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:00,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:58,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:01,613 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:58,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:04,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:58,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:06,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:58,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:07,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:09:58,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▌ | 1087/2230 [6:58:38<4:48:55, 15.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:10,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▌ | 1087/2230 [6:58:38<4:48:55, 15.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:10,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:12,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:10,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:15,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:10,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:17,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:10,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:18,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:10,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:21,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:10,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▌ | 1088/2230 [6:58:50<4:29:43, 14.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:22,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▌ | 1088/2230 [6:58:50<4:29:43, 14.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:22,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:25,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:22,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:26,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:22,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:29,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:22,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:31,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:22,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1089/2230 [6:59:00<4:06:08, 12.94s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1089/2230 [6:59:00<4:06:08, 12.94s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:35,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:37,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:39,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:39,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1090/2230 [6:59:09<3:42:54, 11.73s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:41,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:43,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:41,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:45,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:41,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:41,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:41,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1091/2230 [6:59:16<3:19:16, 10.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:49,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:51,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:49,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:10:53,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:49,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1092/2230 [6:59:23<2:57:10, 9.34s/it] Setting `use_cache=False`...1] 2022-03-27 00:10:49,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1092/2230 [6:59:23<2:57:10, 9.34s/it] Setting `use_cache=False`...1] 2022-03-27 00:10:49,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1092/2230 [6:59:23<2:57:10, 9.34s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▋ | 1092/2230 [6:59:23<2:57:10, 9.34s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:00,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:04,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:04,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:07,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:07,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:11,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:11,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:14,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:14,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:18,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:22,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1093/2230 [6:59:52<4:48:03, 15.20s/it] Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1093/2230 [6:59:52<4:48:03, 15.20s/it] Setting `use_cache=False`...1] 2022-03-27 00:10:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1093/2230 [6:59:52<4:48:03, 15.20s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1093/2230 [6:59:52<4:48:03, 15.20s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:29,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:32,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:32,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:35,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:35,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:39,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:42,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:42,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:42,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:47,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:50,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:50,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:50,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:25,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1094/2230 [7:00:21<6:04:37, 19.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1094/2230 [7:00:21<6:04:37, 19.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:11:57,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:01,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:01,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:04,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:04,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:07,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:11,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:11,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:14,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:14,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:18,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:18,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:11:54,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1095/2230 [7:00:48<6:49:48, 21.66s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|████████████████████████████████████▊ | 1095/2230 [7:00:48<6:49:48, 21.66s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:24,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:24,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:28,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:31,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:31,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:35,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:35,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:38,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2618, 'learning_rate': 0.00019699421965317917, 'epoch': 4.91} [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2376, 'learning_rate': 0.0001968208092485549, 'epoch': 4.92} [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1721, 'learning_rate': 0.0001966473988439306, 'epoch': 4.92} [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1815, 'learning_rate': 0.00019647398843930636, 'epoch': 4.93} [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1885, 'learning_rate': 0.00019630057803468208, 'epoch': 4.93} [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:12:41,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1934, 'learning_rate': 0.00019612716763005777, 'epoch': 4.94} 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1101/2230 [7:03:25<7:57:59, 25.40s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1607, 'learning_rate': 0.0001959537572254335, 'epoch': 4.94} 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 49%|█████████████████████████████████████ | 1102/2230 [7:03:50<7:54:56, 25.26s/it] Setting `use_cache=False`...1] 2022-03-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1666, 'learning_rate': 0.00019578034682080924, 'epoch': 4.95} [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1609, 'learning_rate': 0.00019560693641618496, 'epoch': 4.95} [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:15:38,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1719, 'learning_rate': 0.00019543352601156068, 'epoch': 4.96} 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1105/2230 [7:05:01<7:35:18, 24.28s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1764, 'learning_rate': 0.00019526011560693637, 'epoch': 4.96} 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1512, 'learning_rate': 0.00019508670520231212, 'epoch': 4.96} 50%|█████████████████████████████████████▏ | 1106/2230 [7:05:24<7:27:41, 23.90s/it]g-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:27,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:27,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:31,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:31,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1631, 'learning_rate': 0.00019491329479768784, 'epoch': 4.97} [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:35,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:58,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:58,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:58,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:58,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:17:58,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1507, 'learning_rate': 0.00019473988439306356, 'epoch': 4.97} [WARNING|modeling_utils.py:388] 2022-03-27 00:18:08,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:08,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:08,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:14,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:14,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:18,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:18,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:22,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:22,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1329, 'learning_rate': 0.00019456647398843928, 'epoch': 4.98} [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:27,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:27,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:30,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:18:30,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:34,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:36,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:39,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:39,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:12:21,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▎ | 1111/2230 [7:07:08<6:21:22, 20.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:43,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:45,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:47,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:48,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:50,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:52,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▍ | 1112/2230 [7:07:23<5:50:58, 18.84s/it] Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▍ | 1112/2230 [7:07:23<5:50:58, 18.84s/it] Setting `use_cache=False`...1] 2022-03-27 00:18:41,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▍ | 1112/2230 [7:07:23<5:50:58, 18.84s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:18:56,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:18:59,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:56,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:00,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:56,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:03,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:56,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:04,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:56,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:07,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:56,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:07,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:18:56,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▍ | 1113/2230 [7:07:36<5:18:30, 17.11s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:19:09,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:11,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:09,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:13,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:09,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:15,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:09,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▍ | 1114/2230 [7:07:46<4:35:16, 14.80s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:09,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▍ | 1114/2230 [7:07:46<4:35:16, 14.80s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:09,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:19,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:18,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:22,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:18,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:23,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:18,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:23,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:18,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▌ | 1115/2230 [7:07:52<3:51:12, 12.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:18,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▌ | 1115/2230 [7:07:52<3:51:12, 12.44s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▌ | 1115/2230 [7:07:52<3:51:12, 12.44s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:30,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:33,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:33,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:37,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:37,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:41,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:41,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:44,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:44,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:48,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:51,857 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▌ | 1116/2230 [7:08:22<5:24:58, 17.50s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▌ | 1116/2230 [7:08:22<5:24:58, 17.50s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:26,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▌ | 1116/2230 [7:08:22<5:24:58, 17.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▌ | 1116/2230 [7:08:22<5:24:58, 17.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:19:59,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:02,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:02,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2395, 'learning_rate': 0.00019335260115606933, 'epoch': 5.01} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1913, 'learning_rate': 0.00019317919075144505, 'epoch': 5.01} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1692, 'learning_rate': 0.0001930057803468208, 'epoch': 5.02} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1818, 'learning_rate': 0.00019283236994219652, 'epoch': 5.02} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1619, 'learning_rate': 0.00019265895953757224, 'epoch': 5.03} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1145, 'learning_rate': 0.00019248554913294796, 'epoch': 5.03} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1634, 'learning_rate': 0.0001923121387283237, 'epoch': 5.04} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1414, 'learning_rate': 0.0001921387283236994, 'epoch': 5.04} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1077, 'learning_rate': 0.00019196531791907512, 'epoch': 5.04} [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:20:09,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1166, 'learning_rate': 0.0001916184971098266, 'epoch': 5.05} 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.115, 'learning_rate': 0.0001914450867052023, 'epoch': 5.06} 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 50%|█████████████████████████████████████▊ | 1126/2230 [7:12:56<8:10:45, 26.67s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1072, 'learning_rate': 0.000191271676300578, 'epoch': 5.06} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1018, 'learning_rate': 0.00019109826589595373, 'epoch': 5.07} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1254, 'learning_rate': 0.00019092485549132947, 'epoch': 5.07} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1188, 'learning_rate': 0.0001907514450867052, 'epoch': 5.08} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1018, 'learning_rate': 0.00019057803468208091, 'epoch': 5.08} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0953, 'learning_rate': 0.0001904046242774566, 'epoch': 5.09} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1073, 'learning_rate': 0.00019023121387283236, 'epoch': 5.09} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0956, 'learning_rate': 0.00019005780346820808, 'epoch': 5.09} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0952, 'learning_rate': 0.0001898843930635838, 'epoch': 5.1} 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|█████████████████████████████████████▉ | 1129/2230 [7:14:14<7:59:54, 26.15s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1017, 'learning_rate': 0.00018971098265895952, 'epoch': 5.1} 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0923, 'learning_rate': 0.00018953757225433527, 'epoch': 5.11} 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0987, 'learning_rate': 0.00018936416184971096, 'epoch': 5.11} 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▎ | 1138/2230 [7:18:02<7:38:19, 25.18s/it] Setting `use_cache=False`...1] 2022-03-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1126, 'learning_rate': 0.00018919075144508668, 'epoch': 5.12} [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:30:46,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1142/2230 [7:19:38<7:20:20, 24.28s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1142/2230 [7:19:38<7:20:20, 24.28s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0983, 'learning_rate': 0.0001890173410404624, 'epoch': 5.12} [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0825, 'learning_rate': 0.00018884393063583815, 'epoch': 5.13} [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:31:15,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0934, 'learning_rate': 0.00018867052023121387, 'epoch': 5.13} 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0904, 'learning_rate': 0.0001884971098265896, 'epoch': 5.13} 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▍ | 1144/2230 [7:20:26<7:16:32, 24.12s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:32:31,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:32:31,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:32:31,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:32:31,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:32:31,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:32:31,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:44,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:44,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:44,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0749, 'learning_rate': 0.00018815028901734103, 'epoch': 5.14} [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1007, 'learning_rate': 0.00018797687861271675, 'epoch': 5.15} [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:32:48,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:33:39,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:33:39,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:33:43,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:33:43,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:33:43,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:33:43,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0995, 'learning_rate': 0.00018780346820809247, 'epoch': 5.15} 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1149/2230 [7:22:19<6:47:36, 22.62s/it]g-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:07,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:07,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:07,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:07,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:07,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0912, 'learning_rate': 0.0001876300578034682, 'epoch': 5.16} [WARNING|modeling_utils.py:388] 2022-03-27 00:34:17,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:17,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:17,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:17,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:17,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:17,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:29,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:29,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:29,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:29,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0994, 'learning_rate': 0.00018745664739884394, 'epoch': 5.16} [WARNING|modeling_utils.py:388] 2022-03-27 00:34:29,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:40,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:40,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:40,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:40,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:40,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:50,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:50,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:34:50,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:19:55,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1152/2230 [7:23:23<6:29:51, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1152/2230 [7:23:23<6:29:51, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0708, 'learning_rate': 0.00018728323699421963, 'epoch': 5.17} 52%|██████████████████████████████████████▋ | 1152/2230 [7:23:23<6:29:51, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▋ | 1152/2230 [7:23:23<6:29:51, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:10,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:10,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:10,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:10,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:16,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:16,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:35:21,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:35:21,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:25,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:25,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:25,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:31,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:33,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:33,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:33,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0811, 'learning_rate': 0.00018693641618497108, 'epoch': 5.17} [WARNING|modeling_utils.py:388] 2022-03-27 00:35:39,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:41,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:41,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:41,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:47,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:49,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:35:49,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:34:56,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▊ | 1155/2230 [7:24:21<5:58:02, 19.98s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|██████████████████████████████████████▊ | 1155/2230 [7:24:21<5:58:02, 19.98s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0847, 'learning_rate': 0.00018676300578034682, 'epoch': 5.18} [WARNING|modeling_utils.py:388] 2022-03-27 00:35:57,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:00,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:00,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:36:04,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:36:06,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:36:08,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:36:08,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:12,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:12,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:14,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:14,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:18,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:20,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:22,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:24,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:26,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:28,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:28,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:30,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:32,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:34,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:36,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:38,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:39,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:41,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:41,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:43,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:45,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:47,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:48,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:52,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:54,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:55,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:55,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:36:57,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:00,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:02,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:03,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:07,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:10,073 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:13,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:14,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:17,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:18,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:21,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:21,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:22,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:25,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:27,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:28,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:28,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:30,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:33,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:34,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:37,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:39,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:39,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:41,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:43,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:44,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:46,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:46,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:48,596 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:51,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:52,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:52,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:54,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:54,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:58,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:37:58,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:02,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:02,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:05,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:09,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:09,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:13,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:13,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:16,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:16,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:20,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:20,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:20,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:23,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:27,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:27,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:31,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:31,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:34,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:38,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:38,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:41,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:41,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:45,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:45,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:48,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:52,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:52,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2619, 'learning_rate': 0.0001846820809248555, 'epoch': 5.23} [WARNING|modeling_utils.py:388] 2022-03-27 00:38:55,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:55,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:38:59,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:03,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:03,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:06,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:06,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:10,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:10,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:13,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:17,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:17,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:20,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:20,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2016, 'learning_rate': 0.0001845086705202312, 'epoch': 5.24} [WARNING|modeling_utils.py:388] 2022-03-27 00:39:24,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:27,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:27,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:31,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:31,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:34,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1805, 'learning_rate': 0.0001843352601156069, 'epoch': 5.24} [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:39:38,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.152, 'learning_rate': 0.00018416184971098263, 'epoch': 5.25} 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1561, 'learning_rate': 0.00018398843930635838, 'epoch': 5.25} 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.135, 'learning_rate': 0.0001838150289017341, 'epoch': 5.26} 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1462, 'learning_rate': 0.00018364161849710982, 'epoch': 5.26} 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 1170/2230 [7:28:45<7:23:39, 25.11s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.121, 'learning_rate': 0.00018346820809248552, 'epoch': 5.26} 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.15, 'learning_rate': 0.00018329479768786124, 'epoch': 5.27} 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▍ | 1174/2230 [7:30:33<7:47:53, 26.58s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1087, 'learning_rate': 0.00018312138728323698, 'epoch': 5.27} [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1005, 'learning_rate': 0.0001829479768786127, 'epoch': 5.28} [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1302, 'learning_rate': 0.00018277456647398843, 'epoch': 5.28} [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:42:46,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1133, 'learning_rate': 0.00018260115606936412, 'epoch': 5.29} 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0955, 'learning_rate': 0.00018242774566473987, 'epoch': 5.29} 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1179/2230 [7:32:46<7:42:22, 26.40s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0942, 'learning_rate': 0.0001822543352601156, 'epoch': 5.3} 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▋ | 1181/2230 [7:33:37<7:34:43, 26.01s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1154, 'learning_rate': 0.0001820809248554913, 'epoch': 5.3} 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1216, 'learning_rate': 0.00018190751445086703, 'epoch': 5.3} 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0994, 'learning_rate': 0.00018173410404624278, 'epoch': 5.31} 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0857, 'learning_rate': 0.00018156069364161847, 'epoch': 5.31} 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1136, 'learning_rate': 0.0001813872832369942, 'epoch': 5.32} 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|███████████████████████████████████████▊ | 1182/2230 [7:34:04<7:37:05, 26.17s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1012, 'learning_rate': 0.0001812138728323699, 'epoch': 5.32} g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.095, 'learning_rate': 0.00018104046242774566, 'epoch': 5.33} g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0799, 'learning_rate': 0.00018086705202312138, 'epoch': 5.33} [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:48:20,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:48:51,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:48:51,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0927, 'learning_rate': 0.00018069364161849707, 'epoch': 5.34} 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0887, 'learning_rate': 0.0001805202312138728, 'epoch': 5.34} 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 53%|████████████████████████████████████████ | 1190/2230 [7:37:23<7:07:50, 24.68s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0991, 'learning_rate': 0.00018034682080924854, 'epoch': 5.35} [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1128, 'learning_rate': 0.00018017341040462426, 'epoch': 5.35} [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0827, 'learning_rate': 0.00017999999999999998, 'epoch': 5.35} [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:49:42,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0856, 'learning_rate': 0.0001798265895953757, 'epoch': 5.36} [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:50:58,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.085, 'learning_rate': 0.00017965317919075145, 'epoch': 5.36} 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▏ | 1196/2230 [7:39:44<6:45:09, 23.51s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:51:33,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:51:33,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:51:33,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1049, 'learning_rate': 0.00017947976878612715, 'epoch': 5.37} 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▎ | 1197/2230 [7:40:06<6:39:31, 23.21s/it]g-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:51:59,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:51:59,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:51:59,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:03,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:03,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:07,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0757, 'learning_rate': 0.0001791329479768786, 'epoch': 5.38} [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:24,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0879, 'learning_rate': 0.00017895953757225434, 'epoch': 5.38} [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0971, 'learning_rate': 0.00017878612716763006, 'epoch': 5.39} [WARNING|modeling_utils.py:388] 2022-03-27 00:52:48,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:11,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:11,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:11,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:27,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:27,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:27,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:27,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:53:35,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:53:35,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:53:35,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:53:42,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:53:42,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:46,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:46,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0709, 'learning_rate': 0.00017843930635838147, 'epoch': 5.39} [WARNING|modeling_utils.py:388] 2022-03-27 00:53:46,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:46,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:53:54,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:53:54,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:58,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:53:58,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:02,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:04,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:04,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:04,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0798, 'learning_rate': 0.00017826589595375722, 'epoch': 5.4} [WARNING|modeling_utils.py:388] 2022-03-27 00:54:10,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:10,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:14,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:14,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:18,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:18,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:22,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▌ | 1205/2230 [7:42:52<5:41:11, 19.97s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▌ | 1205/2230 [7:42:52<5:41:11, 19.97s/it] Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:26,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:28,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:30,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:33,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 00:54:33,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:37,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:39,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:41,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:41,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:43,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:45,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:45,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:49,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:51,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:53,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:55,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:57,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:54:57,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:35:54,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▌ | 1207/2230 [7:43:26<5:16:55, 18.59s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:01,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:03,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:05,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:07,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:08,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:10,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:10,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▋ | 1208/2230 [7:43:41<4:57:59, 17.50s/it] Setting `use_cache=False`...1] 2022-03-27 00:54:59,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:16,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:14,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:17,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:14,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:19,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:14,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:21,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:14,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:22,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:14,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:26,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:14,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:26,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:14,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▋ | 1209/2230 [7:43:55<4:37:31, 16.31s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:55:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:29,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:30,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:34,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:35,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:38,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:38,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▋ | 1210/2230 [7:44:07<4:16:44, 15.10s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:55:40,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:41,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:40,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:44,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:40,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:45,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:40,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:48,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:40,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▋ | 1211/2230 [7:44:18<3:55:15, 13.85s/it] Setting `use_cache=False`...1] 2022-03-27 00:55:40,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▋ | 1211/2230 [7:44:18<3:55:15, 13.85s/it] Setting `use_cache=False`...1] 2022-03-27 00:55:40,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:52,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:50,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:53,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:50,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:55,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:50,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:58,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:50,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:55:58,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:55:50,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▊ | 1212/2230 [7:44:28<3:33:41, 12.59s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:56:00,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:01,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:00,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:03,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:00,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:05,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:00,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:07,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:00,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:07,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:00,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▊ | 1213/2230 [7:44:37<3:16:43, 11.61s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:56:09,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:12,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:09,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:14,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:09,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:16,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:09,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:16,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:09,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:18,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:17,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:20,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:17,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:22,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:17,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:22,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:17,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▊ | 1215/2230 [7:44:51<2:35:27, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 54%|████████████████████████████████████████▊ | 1215/2230 [7:44:51<2:35:27, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:28,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:28,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:32,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:32,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:35,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:39,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:39,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:42,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:42,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:46,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:46,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:50,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:50,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:24,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|████████████████████████████████████████▉ | 1216/2230 [7:45:20<4:16:27, 15.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|████████████████████████████████████████▉ | 1216/2230 [7:45:20<4:16:27, 15.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:57,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:56:57,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:00,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:00,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:04,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:07,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:07,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:11,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:11,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:14,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:18,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:18,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:18,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:56:53,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|████████████████████████████████████████▉ | 1217/2230 [7:45:48<5:22:20, 19.09s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|████████████████████████████████████████▉ | 1217/2230 [7:45:48<5:22:20, 19.09s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:28,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:32,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:32,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:35,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:35,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:39,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:42,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:42,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:46,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|████████████████████████████████████████▉ | 1218/2230 [7:46:16<6:06:03, 21.70s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|████████████████████████████████████████▉ | 1218/2230 [7:46:16<6:06:03, 21.70s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:21,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|████████████████████████████████████████▉ | 1218/2230 [7:46:16<6:06:03, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:53,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:53,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:56,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:57:56,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:00,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:00,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:03,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:06,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:06,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1701, 'learning_rate': 0.00017566473988439303, 'epoch': 5.47} [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1412, 'learning_rate': 0.00017549132947976878, 'epoch': 5.47} [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1459, 'learning_rate': 0.0001753179190751445, 'epoch': 5.48} [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.146, 'learning_rate': 0.00017514450867052022, 'epoch': 5.48} [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1425, 'learning_rate': 0.00017497109826589594, 'epoch': 5.48} [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 00:58:11,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1307, 'learning_rate': 0.00017479768786127169, 'epoch': 5.49} 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1416, 'learning_rate': 0.00017462427745664738, 'epoch': 5.49} 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1239, 'learning_rate': 0.0001744508670520231, 'epoch': 5.5} 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▏ | 1224/2230 [7:49:00<7:22:52, 26.41s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.135, 'learning_rate': 0.00017427745664739882, 'epoch': 5.5} 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0945, 'learning_rate': 0.00017410404624277457, 'epoch': 5.51} 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1212, 'learning_rate': 0.0001739306358381503, 'epoch': 5.51} 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1213, 'learning_rate': 0.00017375722543352598, 'epoch': 5.52} 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.11, 'learning_rate': 0.0001735838150289017, 'epoch': 5.52} 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▎ | 1227/2230 [7:50:20<7:21:54, 26.44s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1107, 'learning_rate': 0.00017341040462427745, 'epoch': 5.52} 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1042, 'learning_rate': 0.00017323699421965317, 'epoch': 5.53} 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1009, 'learning_rate': 0.0001730635838150289, 'epoch': 5.53} 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0995, 'learning_rate': 0.00017289017341040459, 'epoch': 5.54} 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1199, 'learning_rate': 0.00017271676300578033, 'epoch': 5.54} 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▍ | 1232/2230 [7:52:29<7:11:37, 25.95s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1462, 'learning_rate': 0.00017254335260115605, 'epoch': 5.55} 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0984, 'learning_rate': 0.00017236994219653178, 'epoch': 5.55} 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▌ | 1237/2230 [7:54:33<6:54:05, 25.02s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.097, 'learning_rate': 0.0001721965317919075, 'epoch': 5.56} 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1193, 'learning_rate': 0.00017202312138728324, 'epoch': 5.56} 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1235, 'learning_rate': 0.00017184971098265894, 'epoch': 5.57} 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1089, 'learning_rate': 0.00017167630057803466, 'epoch': 5.57} 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1091, 'learning_rate': 0.00017150289017341038, 'epoch': 5.57} 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|█████████████████████████████████████████▋ | 1239/2230 [7:55:23<6:51:44, 24.93s/it] Setting `use_cache=False`...1] 2022-03-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:08:54,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:08:54,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:08:54,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0913, 'learning_rate': 0.00017132947976878613, 'epoch': 5.58} [WARNING|modeling_utils.py:388] 2022-03-27 01:08:54,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:09:02,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:09:02,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:09:02,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1071, 'learning_rate': 0.00017115606936416185, 'epoch': 5.58} [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0796, 'learning_rate': 0.00017098265895953757, 'epoch': 5.59} [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0782, 'learning_rate': 0.00017080924855491326, 'epoch': 5.59} [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:09:09,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:25,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:25,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0974, 'learning_rate': 0.000170635838150289, 'epoch': 5.6} [WARNING|modeling_utils.py:388] 2022-03-27 01:10:25,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:25,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:33,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:33,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0765, 'learning_rate': 0.00017046242774566473, 'epoch': 5.6} [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:37,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:57,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:57,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:10:57,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:03,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:03,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:03,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:03,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:03,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0882, 'learning_rate': 0.00017028901734104045, 'epoch': 5.61} [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:14,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████ | 1251/2230 [7:59:57<5:57:27, 21.91s/it]g-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████ | 1251/2230 [7:59:57<5:57:27, 21.91s/it]g-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:42,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:42,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:42,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:42,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:42,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████ | 1252/2230 [8:00:18<5:49:54, 21.47s/it]g-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:52,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:52,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:52,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:58,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:58,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:11:58,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:05,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:05,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:05,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:05,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:11,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:11,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:11,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:11,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:19,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:19,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:19,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:25,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:25,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:29,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:29,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0714, 'learning_rate': 0.00016959537572254333, 'epoch': 5.62} [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:33,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:33,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:37,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:39,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:39,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:43,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:43,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:47,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:47,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:49,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:52,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:12:52,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:56,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:12:58,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:00,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:02,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:04,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:04,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:06,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:08,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:08,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:12,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:14,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:16,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:18,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:20,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 00:57:49,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 1257/2230 [8:01:49<4:58:53, 18.43s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 1257/2230 [8:01:49<4:58:53, 18.43s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:24,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:26,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:28,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:30,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:31,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:33,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:22,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 1258/2230 [8:02:04<4:41:33, 17.38s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 1258/2230 [8:02:04<4:41:33, 17.38s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:39,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:40,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:42,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:44,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:45,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:49,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:37,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 1259/2230 [8:02:18<4:21:58, 16.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:13:50,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 1259/2230 [8:02:18<4:21:58, 16.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:13:50,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:52,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:50,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:53,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:50,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:56,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:50,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:13:58,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:50,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:01,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:13:50,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1260/2230 [8:02:30<4:03:06, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1260/2230 [8:02:30<4:03:06, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:04,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:07,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:08,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:11,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1261/2230 [8:02:41<3:43:01, 13.81s/it] Setting `use_cache=False`...1] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1261/2230 [8:02:41<3:43:01, 13.81s/it] Setting `use_cache=False`...1] 2022-03-27 01:14:03,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:15,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:13,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:17,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:13,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:18,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:13,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:21,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:13,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1262/2230 [8:02:51<3:23:36, 12.62s/it] Setting `use_cache=False`...1] 2022-03-27 01:14:13,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1262/2230 [8:02:51<3:23:36, 12.62s/it] Setting `use_cache=False`...1] 2022-03-27 01:14:13,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:25,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:23,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:27,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:23,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:29,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:23,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:31,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:23,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1263/2230 [8:03:00<3:07:35, 11.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:14:33,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▍ | 1263/2230 [8:03:00<3:07:35, 11.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:14:33,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:34,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:33,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:36,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:33,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:39,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:33,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:41,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:40,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:41,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:40,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:42,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:40,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:45,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:40,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▌ | 1265/2230 [8:03:14<2:27:49, 9.19s/it] Setting `use_cache=False`...1] 2022-03-27 01:14:40,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▌ | 1265/2230 [8:03:14<2:27:49, 9.19s/it] Setting `use_cache=False`...1] 2022-03-27 01:14:40,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▌ | 1265/2230 [8:03:14<2:27:49, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:51,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:51,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:58,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:14:58,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:02,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:06,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:06,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:09,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:09,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:13,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:13,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▌ | 1266/2230 [8:03:43<4:04:44, 15.23s/it] Setting `use_cache=False`...1] 2022-03-27 01:14:47,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▌ | 1266/2230 [8:03:43<4:04:44, 15.23s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:20,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:20,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:24,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:24,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:27,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:27,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:31,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:34,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:34,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:38,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:38,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:41,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:41,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:17,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▌ | 1267/2230 [8:04:12<5:07:25, 19.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▌ | 1267/2230 [8:04:12<5:07:25, 19.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:48,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:48,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:52,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:52,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:55,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:59,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:15:59,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:02,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:02,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:06,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:09,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:09,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:09,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:15:45,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▋ | 1268/2230 [8:04:39<5:48:39, 21.75s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▋ | 1268/2230 [8:04:39<5:48:39, 21.75s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:16,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:16,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:19,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:23,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:23,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:26,722 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1353, 'learning_rate': 0.0001669942196531792, 'epoch': 5.69} [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1578, 'learning_rate': 0.0001668208092485549, 'epoch': 5.7} [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.14, 'learning_rate': 0.0001666473988439306, 'epoch': 5.7} [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1199, 'learning_rate': 0.00016647398843930633, 'epoch': 5.7} [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:16:30,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1321, 'learning_rate': 0.0001661271676300578, 'epoch': 5.71} 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1182, 'learning_rate': 0.0001659537572254335, 'epoch': 5.72} 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▊ | 1273/2230 [8:06:56<6:57:34, 26.18s/it] Setting `use_cache=False`...1] 2022-03-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1181, 'learning_rate': 0.00016578034682080922, 'epoch': 5.72} [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:19:48,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1181, 'learning_rate': 0.00016560693641618496, 'epoch': 5.73} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1041, 'learning_rate': 0.00016543352601156068, 'epoch': 5.73} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1226, 'learning_rate': 0.0001652601156069364, 'epoch': 5.74} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1154, 'learning_rate': 0.0001650867052023121, 'epoch': 5.74} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1378, 'learning_rate': 0.00016491329479768785, 'epoch': 5.74} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1154, 'learning_rate': 0.00016473988439306357, 'epoch': 5.75} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1216, 'learning_rate': 0.0001645664739884393, 'epoch': 5.75} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1142, 'learning_rate': 0.000164393063583815, 'epoch': 5.76} 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 1277/2230 [8:08:43<7:00:08, 26.45s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1157, 'learning_rate': 0.00016421965317919076, 'epoch': 5.76} 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▏ | 1285/2230 [8:12:07<6:39:36, 25.37s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0988, 'learning_rate': 0.00016404624277456645, 'epoch': 5.77} [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:10,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▎ | 1287/2230 [8:12:57<6:32:45, 24.99s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▎ | 1287/2230 [8:12:57<6:32:45, 24.99s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▎ | 1287/2230 [8:12:57<6:32:45, 24.99s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▎ | 1287/2230 [8:12:57<6:32:45, 24.99s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▎ | 1287/2230 [8:12:57<6:32:45, 24.99s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0961, 'learning_rate': 0.0001636994219653179, 'epoch': 5.78} [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0861, 'learning_rate': 0.00016352601156069364, 'epoch': 5.78} [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:24:39,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1290/2230 [8:14:10<6:25:03, 24.58s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1290/2230 [8:14:10<6:25:03, 24.58s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0907, 'learning_rate': 0.00016335260115606936, 'epoch': 5.78} [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0893, 'learning_rate': 0.00016317919075144505, 'epoch': 5.79} [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:25:47,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0836, 'learning_rate': 0.00016300578034682077, 'epoch': 5.79} 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▍ | 1292/2230 [8:14:58<6:17:22, 24.14s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0774, 'learning_rate': 0.00016283236994219652, 'epoch': 5.8} [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0944, 'learning_rate': 0.00016265895953757224, 'epoch': 5.8} [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:26:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0802, 'learning_rate': 0.00016248554913294796, 'epoch': 5.81} [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:27:28,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0914, 'learning_rate': 0.00016231213872832368, 'epoch': 5.81} [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0849, 'learning_rate': 0.00016213872832369943, 'epoch': 5.82} [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:27:53,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0907, 'learning_rate': 0.00016196531791907512, 'epoch': 5.82} [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:28:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:02,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:02,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:02,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:02,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▋ | 1299/2230 [8:17:38<5:50:04, 22.56s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▋ | 1299/2230 [8:17:38<5:50:04, 22.56s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0839, 'learning_rate': 0.00016179190751445085, 'epoch': 5.83} 58%|███████████████████████████████████████████▋ | 1299/2230 [8:17:38<5:50:04, 22.56s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▋ | 1299/2230 [8:17:38<5:50:04, 22.56s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▋ | 1299/2230 [8:17:38<5:50:04, 22.56s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▋ | 1299/2230 [8:17:38<5:50:04, 22.56s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:29:23,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:29:23,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:29:23,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:28,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:28,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:28,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:28,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0997, 'learning_rate': 0.00016161849710982657, 'epoch': 5.83} [WARNING|modeling_utils.py:388] 2022-03-27 01:29:28,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:39,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:39,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0792, 'learning_rate': 0.00016144508670520231, 'epoch': 5.83} [WARNING|modeling_utils.py:388] 2022-03-27 01:29:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:29:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:03,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:03,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:03,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:30:09,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:30:09,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:30:09,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:30:09,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0732, 'learning_rate': 0.00016127167630057803, 'epoch': 5.84} [WARNING|modeling_utils.py:388] 2022-03-27 01:30:17,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:17,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:17,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:24,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:24,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:24,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:30,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:30,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▊ | 1303/2230 [8:19:01<5:24:49, 21.02s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▊ | 1303/2230 [8:19:01<5:24:49, 21.02s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:36,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:36,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:36,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:42,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:42,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:30:46,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:30:46,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:50,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:50,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:50,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0786, 'learning_rate': 0.00016092485549132945, 'epoch': 5.85} [WARNING|modeling_utils.py:388] 2022-03-27 01:30:56,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:59,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:30:59,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:31:03,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:31:03,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:07,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:09,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:09,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:09,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:31:13,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:31:13,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:31:13,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:31:19,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:31:19,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:22,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:24,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:27,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:29,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:29,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:31,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:31,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:31,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:37,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:39,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:41,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:43,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:45,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:45,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:47,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:49,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:51,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:53,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:54,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:56,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:58,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:31:58,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:00,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:02,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:04,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:07,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:09,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:11,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:13,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:14,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:14,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:18,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:19,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:21,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:24,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:25,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:27,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:27,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:30,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:31,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:36,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:38,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:38,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:40,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:42,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:43,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:46,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:48,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:48,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:50,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:52,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:53,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:56,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:58,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:32:58,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:00,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:02,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:04,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:05,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:05,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:08,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:10,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:12,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:12,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:14,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:14,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:18,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:18,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:21,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:21,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:25,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:28,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:28,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:32,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:32,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:36,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:36,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:39,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:39,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:43,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:43,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:46,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:46,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:50,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:53,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:53,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:33:56,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:00,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:00,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:03,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:03,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:07,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:07,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1622, 'learning_rate': 0.00015867052023121387, 'epoch': 5.91} [WARNING|modeling_utils.py:388] 2022-03-27 01:34:10,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:14,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:14,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:17,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:17,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:20,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:24,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:24,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:27,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:30,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:30,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:34,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:34,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:34,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:37,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:41,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:41,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:44,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:44,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:47,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:51,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:51,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:54,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:54,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1282, 'learning_rate': 0.0001583236994219653, 'epoch': 5.91} [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1433, 'learning_rate': 0.000158150289017341, 'epoch': 5.92} [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:34:58,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.118, 'learning_rate': 0.00015797687861271675, 'epoch': 5.92} g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1088, 'learning_rate': 0.00015780346820809248, 'epoch': 5.93} g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1085, 'learning_rate': 0.0001576300578034682, 'epoch': 5.93} 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1076, 'learning_rate': 0.00015745664739884392, 'epoch': 5.94} 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1046, 'learning_rate': 0.00015728323699421966, 'epoch': 5.94} 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▍ | 1323/2230 [8:25:15<6:21:02, 25.21s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1198, 'learning_rate': 0.00015710982658959536, 'epoch': 5.95} 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1045, 'learning_rate': 0.00015693641618497108, 'epoch': 5.95} 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 1326/2230 [8:26:31<6:19:31, 25.19s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0901, 'learning_rate': 0.0001567630057803468, 'epoch': 5.96} 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0972, 'learning_rate': 0.00015658959537572255, 'epoch': 5.96} 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0917, 'learning_rate': 0.00015641618497109827, 'epoch': 5.96} 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▋ | 1328/2230 [8:27:18<6:06:11, 24.36s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▊ | 1331/2230 [8:28:25<5:42:31, 22.86s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▊ | 1331/2230 [8:28:25<5:42:31, 22.86s/it]g-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0869, 'learning_rate': 0.00015624277456647396, 'epoch': 5.97} [WARNING|modeling_bart.py:1051] 2022-03-27 01:40:02,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:40:02,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:06,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:06,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:06,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:06,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:14,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:14,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:14,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:14,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1015, 'learning_rate': 0.00015606936416184968, 'epoch': 5.97} [WARNING|modeling_utils.py:388] 2022-03-27 01:40:14,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:24,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:24,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:24,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:30,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:32,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:32,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:32,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:38,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:38,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0788, 'learning_rate': 0.00015589595375722543, 'epoch': 5.98} [WARNING|modeling_bart.py:1051] 2022-03-27 01:40:42,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:40:45,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:40:45,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:48,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:51,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:53,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:55,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:55,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:40:55,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:40:59,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:01,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:03,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:05,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:07,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:08,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:10,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:10,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:16:13,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▉ | 1335/2230 [8:29:39<4:42:24, 18.93s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:41:12,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:14,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:12,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:17,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:12,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:19,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:12,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:20,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:12,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:23,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:12,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:23,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:12,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|████████████████████████████████████████████▉ | 1336/2230 [8:29:52<4:13:33, 17.02s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:41:24,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:27,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:24,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:28,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:24,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:31,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:24,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:31,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:24,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:33,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:24,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:35,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:37,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:39,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████ | 1338/2230 [8:30:09<3:08:45, 12.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:41:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████ | 1338/2230 [8:30:09<3:08:45, 12.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:41:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████ | 1338/2230 [8:30:09<3:08:45, 12.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████ | 1338/2230 [8:30:09<3:08:45, 12.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:47,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:47,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:50,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:54,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:54,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:58,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:41:58,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:01,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:01,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:05,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:05,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:08,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:08,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:41:43,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████ | 1339/2230 [8:30:39<4:23:42, 17.76s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████ | 1339/2230 [8:30:39<4:23:42, 17.76s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:16,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:16,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:19,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:23,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:23,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1261, 'learning_rate': 0.00015468208092485547, 'epoch': 6.01} [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1063, 'learning_rate': 0.00015450867052023122, 'epoch': 6.01} [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0978, 'learning_rate': 0.00015433526011560692, 'epoch': 6.02} [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0947, 'learning_rate': 0.00015416184971098264, 'epoch': 6.02} [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1061, 'learning_rate': 0.00015398843930635836, 'epoch': 6.03} [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0974, 'learning_rate': 0.0001538150289017341, 'epoch': 6.03} [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0876, 'learning_rate': 0.00015364161849710983, 'epoch': 6.04} [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:42:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.098, 'learning_rate': 0.00015346820809248555, 'epoch': 6.04} 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0721, 'learning_rate': 0.00015329479768786124, 'epoch': 6.04} 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.088, 'learning_rate': 0.000153121387283237, 'epoch': 6.05} 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0813, 'learning_rate': 0.0001529479768786127, 'epoch': 6.05} 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0787, 'learning_rate': 0.00015277456647398843, 'epoch': 6.06} 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0781, 'learning_rate': 0.00015260115606936415, 'epoch': 6.06} 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▎ | 1347/2230 [8:34:20<6:32:04, 26.64s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0721, 'learning_rate': 0.0001522543352601156, 'epoch': 6.07} 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0887, 'learning_rate': 0.0001520809248554913, 'epoch': 6.08} 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1353/2230 [8:36:58<6:23:00, 26.20s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.056, 'learning_rate': 0.00015190751445086703, 'epoch': 6.08} 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0834, 'learning_rate': 0.00015173410404624278, 'epoch': 6.09} 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▌ | 1356/2230 [8:38:14<6:14:22, 25.70s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0627, 'learning_rate': 0.0001515606936416185, 'epoch': 6.09} 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1358/2230 [8:39:05<6:11:20, 25.55s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0651, 'learning_rate': 0.0001513872832369942, 'epoch': 6.09} 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0729, 'learning_rate': 0.00015121387283236992, 'epoch': 6.1} 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0604, 'learning_rate': 0.00015104046242774566, 'epoch': 6.1} 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▋ | 1359/2230 [8:39:30<6:08:18, 25.37s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0651, 'learning_rate': 0.00015086705202312138, 'epoch': 6.11} 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1362/2230 [8:40:43<5:58:35, 24.79s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0665, 'learning_rate': 0.0001506936416184971, 'epoch': 6.11} 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0703, 'learning_rate': 0.0001505202312138728, 'epoch': 6.12} 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▊ | 1363/2230 [8:41:09<5:59:04, 24.85s/it] Setting `use_cache=False`...1] 2022-03-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0708, 'learning_rate': 0.00015034682080924855, 'epoch': 6.12} [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0615, 'learning_rate': 0.00015017341040462427, 'epoch': 6.13} [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:53:28,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▉ | 1367/2230 [8:42:43<5:41:39, 23.75s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|█████████████████████████████████████████████▉ | 1367/2230 [8:42:43<5:41:39, 23.75s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0501, 'learning_rate': 0.00015, 'epoch': 6.13} [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:54:19,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0749, 'learning_rate': 0.0001498265895953757, 'epoch': 6.13} 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1368/2230 [8:43:06<5:38:15, 23.54s/it]g-point operations will not be computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:54:54,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:54:54,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:54:54,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:54:54,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:42:12,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1369/2230 [8:43:29<5:39:01, 23.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0688, 'learning_rate': 0.00014930635838150287, 'epoch': 6.15} 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████ | 1370/2230 [8:43:52<5:33:25, 23.26s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:55:53,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:55:53,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:55:53,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:55:53,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:55:53,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:55:53,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:05,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:05,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:05,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0627, 'learning_rate': 0.0001491329479768786, 'epoch': 6.15} [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:11,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:55:02,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▏ | 1373/2230 [8:44:57<5:17:14, 22.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▏ | 1373/2230 [8:44:57<5:17:14, 22.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0627, 'learning_rate': 0.0001489595375722543, 'epoch': 6.16} 62%|██████████████████████████████████████████████▏ | 1373/2230 [8:44:57<5:17:14, 22.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▏ | 1373/2230 [8:44:57<5:17:14, 22.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:38,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:38,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:38,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:38,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:38,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:56:38,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:50,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:50,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0514, 'learning_rate': 0.00014878612716763003, 'epoch': 6.16} [WARNING|modeling_utils.py:388] 2022-03-27 01:56:54,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:54,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:54,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:54,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:56:54,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0587, 'learning_rate': 0.00014861271676300578, 'epoch': 6.17} [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:04,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:21,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:21,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:21,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:27,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:27,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:27,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:27,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:27,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:57:35,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:57:35,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:39,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:39,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:39,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:45,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:45,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:57:50,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:57:50,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▎ | 1377/2230 [8:46:19<4:54:02, 20.68s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:54,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:57:54,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:57:58,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:57:58,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:58:02,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:58:04,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:58:04,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:08,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▎ | 1378/2230 [8:46:38<4:44:06, 20.01s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▎ | 1378/2230 [8:46:38<4:44:06, 20.01s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:58:12,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:58:14,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:58:16,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 01:58:16,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:20,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:22,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:24,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:27,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:27,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:29,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:31,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:33,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:35,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:37,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:39,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:41,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:43,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:43,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:45,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:47,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:49,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:51,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:53,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:54,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:56,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:58,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:58:58,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:00,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:02,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:04,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:06,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:09,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:11,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:13,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:13,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:14,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:18,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:19,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:21,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:24,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:25,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:25,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:28,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:29,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:32,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:33,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:36,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:36,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:37,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:40,368 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:42,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:43,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:46,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:46,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:48,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:50,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:52,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:54,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:56,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:56,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 01:59:58,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:00,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:02,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:02,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:05,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:05,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:08,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0718, 'learning_rate': 0.00014635838150289015, 'epoch': 6.22} [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:13,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:16,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:16,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:20,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:20,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:24,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:24,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:27,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:31,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:31,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:34,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:34,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:34,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:38,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:41,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:41,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:45,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:45,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:49,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:49,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:52,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:52,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:56,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:59,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:00:59,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:03,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:03,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:03,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:06,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:06,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:10,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:13,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:13,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:17,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:17,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:20,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:24,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:24,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:27,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:27,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:31,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:34,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:34,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1114, 'learning_rate': 0.00014583815028901734, 'epoch': 6.24} [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:38,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:38,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:41,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:45,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:45,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:48,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:48,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:51,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1036, 'learning_rate': 0.00014566473988439306, 'epoch': 6.24} [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.092, 'learning_rate': 0.00014549132947976878, 'epoch': 6.25} [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:01:55,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0812, 'learning_rate': 0.0001453179190751445, 'epoch': 6.25} 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1137, 'learning_rate': 0.00014514450867052022, 'epoch': 6.26} 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0868, 'learning_rate': 0.00014497109826589594, 'epoch': 6.26} 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0821, 'learning_rate': 0.00014479768786127166, 'epoch': 6.26} 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1276, 'learning_rate': 0.00014462427745664738, 'epoch': 6.27} 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|██████████████████████████████████████████████▉ | 1394/2230 [8:51:26<5:58:15, 25.71s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0639, 'learning_rate': 0.0001444508670520231, 'epoch': 6.27} Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0941, 'learning_rate': 0.00014427745664739882, 'epoch': 6.28} Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0772, 'learning_rate': 0.00014410404624277454, 'epoch': 6.28} Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0682, 'learning_rate': 0.00014393063583815026, 'epoch': 6.29} Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0796, 'learning_rate': 0.000143757225433526, 'epoch': 6.29} 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0795, 'learning_rate': 0.0001435838150289017, 'epoch': 6.3} 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0709, 'learning_rate': 0.00014341040462427745, 'epoch': 6.3} 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0782, 'learning_rate': 0.00014323699421965317, 'epoch': 6.3} 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0603, 'learning_rate': 0.0001430635838150289, 'epoch': 6.31} 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0712, 'learning_rate': 0.00014289017341040462, 'epoch': 6.31} 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▏ | 1403/2230 [8:55:23<5:59:22, 26.07s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0608, 'learning_rate': 0.00014271676300578034, 'epoch': 6.32} 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1409/2230 [8:57:55<5:46:21, 25.31s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0686, 'learning_rate': 0.00014254335260115606, 'epoch': 6.32} 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0798, 'learning_rate': 0.00014236994219653178, 'epoch': 6.33} 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▍ | 1410/2230 [8:58:20<5:43:04, 25.10s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0663, 'learning_rate': 0.0001421965317919075, 'epoch': 6.33} [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:10:33,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0611, 'learning_rate': 0.00014202312138728322, 'epoch': 6.34} [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0591, 'learning_rate': 0.00014184971098265894, 'epoch': 6.34} [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0665, 'learning_rate': 0.00014167630057803466, 'epoch': 6.35} [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:11:04,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:14,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:14,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:14,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:14,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0568, 'learning_rate': 0.00014150289017341038, 'epoch': 6.35} [WARNING|modeling_utils.py:388] 2022-03-27 02:12:14,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:12:25,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0707, 'learning_rate': 0.00014132947976878613, 'epoch': 6.35} g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:04,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:04,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1241, 'learning_rate': 0.00014115606936416182, 'epoch': 6.36} [WARNING|modeling_utils.py:388] 2022-03-27 02:13:04,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:04,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:04,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:04,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:13:16,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0778, 'learning_rate': 0.0001408092485549133, 'epoch': 6.37} [WARNING|modeling_utils.py:388] 2022-03-27 02:13:31,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0621, 'learning_rate': 0.000140635838150289, 'epoch': 6.37} [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:13:57,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:26,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:26,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:26,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:26,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.062, 'learning_rate': 0.00014046242774566473, 'epoch': 6.38} 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▊ | 1422/2230 [9:03:02<5:05:48, 22.71s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:55,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:55,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:55,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:59,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:59,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:59,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:59,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:59,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:59,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:14:59,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▉ | 1424/2230 [9:03:45<4:54:58, 21.96s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▉ | 1424/2230 [9:03:45<4:54:58, 21.96s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.073, 'learning_rate': 0.00014011560693641617, 'epoch': 6.39} 64%|███████████████████████████████████████████████▉ | 1424/2230 [9:03:45<4:54:58, 21.96s/it]g-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:24,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:15:36,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:15:36,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:15:36,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:15:36,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0475, 'learning_rate': 0.0001399421965317919, 'epoch': 6.39} [WARNING|modeling_bart.py:1051] 2022-03-27 02:15:36,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:15:36,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:48,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:48,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:48,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:54,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:54,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:54,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:54,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:15:54,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:02,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:02,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:06,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:06,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:06,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:13,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:13,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:17,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▉ | 1427/2230 [9:04:46<4:38:18, 20.80s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|███████████████████████████████████████████████▉ | 1427/2230 [9:04:46<4:38:18, 20.80s/it] Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:21,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:21,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:21,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:27,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:29,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:29,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:29,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:35,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:37,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:16:37,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0532, 'learning_rate': 0.00013942196531791906, 'epoch': 6.4} [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:41,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:41,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:45,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:47,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:49,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:51,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:53,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:53,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 01:56:30,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████ | 1429/2230 [9:05:23<4:19:21, 19.43s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:16:58,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:00,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:02,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:04,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:06,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:08,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:10,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:16:56,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████ | 1430/2230 [9:05:39<4:06:36, 18.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████ | 1430/2230 [9:05:39<4:06:36, 18.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:14,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:16,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:18,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:19,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:21,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:23,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▏ | 1431/2230 [9:05:54<3:52:24, 17.45s/it] Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▏ | 1431/2230 [9:05:54<3:52:24, 17.45s/it] Setting `use_cache=False`...1] 2022-03-27 02:17:12,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▏ | 1431/2230 [9:05:54<3:52:24, 17.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:29,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:31,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:33,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:36,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:38,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:40,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:40,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:27,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▏ | 1432/2230 [9:06:09<3:41:02, 16.62s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:17:42,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:45,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:42,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:46,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:42,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:48,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:42,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:51,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:42,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:52,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:42,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:52,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:42,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▏ | 1433/2230 [9:06:21<3:24:27, 15.39s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:17:54,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:57,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:54,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:17:58,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:54,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:01,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:54,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:02,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:17:54,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▏ | 1434/2230 [9:06:32<3:06:35, 14.06s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:05,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▏ | 1434/2230 [9:06:32<3:06:35, 14.06s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:05,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:06,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:05,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:09,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:05,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:11,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:05,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:13,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:05,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1435/2230 [9:06:42<2:48:55, 12.75s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:14,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1435/2230 [9:06:42<2:48:55, 12.75s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:14,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:17,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:14,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:19,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:14,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:21,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:14,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1436/2230 [9:06:50<2:31:29, 11.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:23,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1436/2230 [9:06:50<2:31:29, 11.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:23,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:25,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:23,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:27,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:23,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:29,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:23,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:31,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:31,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:33,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:35,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1438/2230 [9:07:05<2:03:25, 9.35s/it] Setting `use_cache=False`...1] 2022-03-27 02:18:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1438/2230 [9:07:05<2:03:25, 9.35s/it] Setting `use_cache=False`...1] 2022-03-27 02:18:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1438/2230 [9:07:05<2:03:25, 9.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 64%|████████████████████████████████████████████████▎ | 1438/2230 [9:07:05<2:03:25, 9.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:42,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:42,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:50,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:50,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:53,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:53,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:57,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:18:57,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:00,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:04,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:04,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:18:39,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▍ | 1439/2230 [9:07:34<3:22:02, 15.32s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▍ | 1439/2230 [9:07:34<3:22:02, 15.32s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1585, 'learning_rate': 0.000137514450867052, 'epoch': 6.45} [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:11,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:11,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:15,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:18,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:18,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:22,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:22,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:25,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:29,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:29,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:32,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:32,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:32,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:08,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▍ | 1440/2230 [9:08:03<4:13:10, 19.23s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▍ | 1440/2230 [9:08:03<4:13:10, 19.23s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:39,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:39,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:43,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:46,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:46,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:50,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:53,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:53,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:57,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:19:57,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:00,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:00,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▍ | 1441/2230 [9:08:31<4:47:19, 21.85s/it] Setting `use_cache=False`...1] 2022-03-27 02:19:36,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▍ | 1441/2230 [9:08:31<4:47:19, 21.85s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:07,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:07,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:11,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:11,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:14,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:18,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:18,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1004, 'learning_rate': 0.00013699421965317917, 'epoch': 6.47} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1034, 'learning_rate': 0.00013682080924855492, 'epoch': 6.47} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0949, 'learning_rate': 0.00013664739884393061, 'epoch': 6.48} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0756, 'learning_rate': 0.00013647398843930636, 'epoch': 6.48} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0965, 'learning_rate': 0.00013630057803468206, 'epoch': 6.48} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0813, 'learning_rate': 0.0001361271676300578, 'epoch': 6.49} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0904, 'learning_rate': 0.00013595375722543352, 'epoch': 6.49} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0747, 'learning_rate': 0.00013578034682080925, 'epoch': 6.5} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0892, 'learning_rate': 0.00013560693641618497, 'epoch': 6.5} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0595, 'learning_rate': 0.0001354335260115607, 'epoch': 6.51} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0702, 'learning_rate': 0.0001352601156069364, 'epoch': 6.51} [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:20:21,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0634, 'learning_rate': 0.00013491329479768785, 'epoch': 6.52} 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0731, 'learning_rate': 0.00013473988439306357, 'epoch': 6.52} 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.076, 'learning_rate': 0.0001345664739884393, 'epoch': 6.53} 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0652, 'learning_rate': 0.00013439306358381504, 'epoch': 6.53} 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0547, 'learning_rate': 0.00013421965317919073, 'epoch': 6.54} 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0743, 'learning_rate': 0.00013404624277456648, 'epoch': 6.54} 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0592, 'learning_rate': 0.00013387283236994217, 'epoch': 6.55} 65%|████████████████████████████████████████████████▊ | 1453/2230 [9:13:50<5:36:10, 25.96s/it] Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0639, 'learning_rate': 0.00013369942196531792, 'epoch': 6.55} [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:28:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.06, 'learning_rate': 0.00013352601156069364, 'epoch': 6.56} [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:28:55,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0712, 'learning_rate': 0.00013335260115606936, 'epoch': 6.56} [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.065, 'learning_rate': 0.00013317919075144508, 'epoch': 6.57} [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0553, 'learning_rate': 0.00013300578034682078, 'epoch': 6.57} [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0654, 'learning_rate': 0.00013283236994219652, 'epoch': 6.57} [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:29:15,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0631, 'learning_rate': 0.00013265895953757224, 'epoch': 6.58} 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0729, 'learning_rate': 0.00013248554913294797, 'epoch': 6.58} 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▎ | 1467/2230 [9:19:34<5:03:15, 23.85s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0612, 'learning_rate': 0.00013231213872832369, 'epoch': 6.59} 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▍ | 1469/2230 [9:20:22<5:00:38, 23.70s/it] Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.049, 'learning_rate': 0.0001321387283236994, 'epoch': 6.59} [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:12,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0715, 'learning_rate': 0.00013196531791907513, 'epoch': 6.6} [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:30,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:51,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:51,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0667, 'learning_rate': 0.00013179190751445085, 'epoch': 6.6} [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:32:55,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:15,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:15,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:15,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:15,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:15,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0601, 'learning_rate': 0.00013161849710982657, 'epoch': 6.61} [WARNING|modeling_utils.py:388] 2022-03-27 02:33:15,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:28,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:28,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:28,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:28,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:28,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0543, 'learning_rate': 0.0001314450867052023, 'epoch': 6.61} [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:52,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:33:52,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:57,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:57,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:33:57,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:02,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:02,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0533, 'learning_rate': 0.000131271676300578, 'epoch': 6.61} [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:02,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:08,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:08,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:08,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:15,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:15,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:25,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:25,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:25,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:31,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:33,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:33,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:33,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:39,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:39,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:20:04,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▋ | 1477/2230 [9:23:11<4:17:43, 20.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▋ | 1477/2230 [9:23:11<4:17:43, 20.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0581, 'learning_rate': 0.00013092485549132945, 'epoch': 6.62} [WARNING|modeling_utils.py:388] 2022-03-27 02:34:47,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:47,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:52,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:34:52,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:56,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:58,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:34:58,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:34:44,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▋ | 1478/2230 [9:23:29<4:09:54, 19.94s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▋ | 1478/2230 [9:23:29<4:09:54, 19.94s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0631, 'learning_rate': 0.0001307514450867052, 'epoch': 6.63} [WARNING|modeling_utils.py:388] 2022-03-27 02:35:06,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:35:08,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:35:08,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:12,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:14,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:16,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:19,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:19,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:19,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:35:22,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:35:24,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:35:26,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 02:35:26,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:30,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:32,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:34,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:35:02,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▊ | 1480/2230 [9:24:04<3:50:31, 18.44s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▊ | 1480/2230 [9:24:04<3:50:31, 18.44s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:38,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:40,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:42,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:44,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:46,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:48,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:48,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▊ | 1481/2230 [9:24:19<3:38:21, 17.49s/it] Setting `use_cache=False`...1] 2022-03-27 02:35:36,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▊ | 1481/2230 [9:24:19<3:38:21, 17.49s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:53,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:56,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:58,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:35:59,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:03,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:04,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:04,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:35:51,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▊ | 1482/2230 [9:24:34<3:28:17, 16.71s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:36:06,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:08,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:06,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:10,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:06,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:13,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:06,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:14,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:06,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:16,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:06,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:16,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:06,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1483/2230 [9:24:46<3:13:17, 15.53s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:36:19,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:20,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:19,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:23,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:19,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:25,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:19,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:27,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:19,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1484/2230 [9:24:57<2:56:17, 14.18s/it] Setting `use_cache=False`...1] 2022-03-27 02:36:19,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1484/2230 [9:24:57<2:56:17, 14.18s/it] Setting `use_cache=False`...1] 2022-03-27 02:36:19,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:31,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:30,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:34,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:30,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:35,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:30,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:37,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:30,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1485/2230 [9:25:07<2:39:09, 12.82s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:36:40,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1485/2230 [9:25:07<2:39:09, 12.82s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:36:40,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:42,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:40,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:43,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:40,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:45,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:40,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1486/2230 [9:25:16<2:22:41, 11.51s/it] Setting `use_cache=False`...1] 2022-03-27 02:36:40,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1486/2230 [9:25:16<2:22:41, 11.51s/it] Setting `use_cache=False`...1] 2022-03-27 02:36:40,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:50,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:48,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:52,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:48,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:53,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:48,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:53,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:48,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████ | 1487/2230 [9:25:23<2:07:18, 10.28s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:36:55,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:58,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:55,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:36:59,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:55,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:02,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:55,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:02,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:36:55,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████ | 1488/2230 [9:25:30<1:56:11, 9.40s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████ | 1488/2230 [9:25:30<1:56:11, 9.40s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:07,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:07,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:11,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:11,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:15,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:18,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:18,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:22,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:22,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:25,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:25,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:29,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:29,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:04,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████ | 1489/2230 [9:25:59<3:08:44, 15.28s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████ | 1489/2230 [9:25:59<3:08:44, 15.28s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:36,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:36,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:40,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:43,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:43,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:47,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:47,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:50,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:50,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:54,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:57,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:57,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:37:57,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:37:33,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████ | 1490/2230 [9:26:28<3:56:25, 19.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████ | 1490/2230 [9:26:28<3:56:25, 19.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:04,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:04,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:08,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:11,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:11,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:15,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:15,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:18,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:22,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:22,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:25,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1491/2230 [9:26:55<4:27:49, 21.74s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1491/2230 [9:26:55<4:27:49, 21.74s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:01,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1491/2230 [9:26:55<4:27:49, 21.74s/it][WARNING|modeling_bart.py:1051] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:32,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:32,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:35,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:35,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:39,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:42,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:42,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:46,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:46,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:46,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 02:38:46,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0985, 'learning_rate': 0.00012832369942196532, 'epoch': 6.69} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.089, 'learning_rate': 0.00012815028901734104, 'epoch': 6.7} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0951, 'learning_rate': 0.00012797687861271676, 'epoch': 6.7} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0904, 'learning_rate': 0.00012780346820809248, 'epoch': 6.7} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0874, 'learning_rate': 0.0001276300578034682, 'epoch': 6.71} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0686, 'learning_rate': 0.00012745664739884392, 'epoch': 6.71} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0789, 'learning_rate': 0.00012728323699421964, 'epoch': 6.72} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0757, 'learning_rate': 0.00012710982658959536, 'epoch': 6.72} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▏ | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/27/2022 02:52:02 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.3513650596141815, 'eval_wer': 0.10093216977389925, 'eval_runtime': 571.1505, 'eval_samples_per_second': 4.626, 'eval_steps_per_second': 0.58, 'epoch': 6.73} [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0772, 'learning_rate': 0.0001267630057803468, 'epoch': 6.73} [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0731, 'learning_rate': 0.00012658959537572252, 'epoch': 6.74} [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0637, 'learning_rate': 0.00012641618497109824, 'epoch': 6.74} [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 02:42:31,041 >> Num examples = 2642 | 1492/2230 [9:27:23<4:48:13, 23.43s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0737, 'learning_rate': 0.000126242774566474, 'epoch': 6.74} 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0885, 'learning_rate': 0.00012606936416184968, 'epoch': 6.75} 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0575, 'learning_rate': 0.00012589595375722543, 'epoch': 6.75} 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0623, 'learning_rate': 0.00012572254335260115, 'epoch': 6.76} 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0807, 'learning_rate': 0.00012554913294797687, 'epoch': 6.76} 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0606, 'learning_rate': 0.0001253757225433526, 'epoch': 6.77} 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|█████████████████████████████████████████████████▉ | 1504/2230 [9:44:17<19:46:16, 98.04s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0596, 'learning_rate': 0.00012520231213872831, 'epoch': 6.77} Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.066, 'learning_rate': 0.00012502890173410404, 'epoch': 6.78} 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0648, 'learning_rate': 0.00012485549132947976, 'epoch': 6.78} 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▊ | 1511/2230 [9:47:15<6:11:20, 30.99s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0554, 'learning_rate': 0.00012468208092485548, 'epoch': 6.78} 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0601, 'learning_rate': 0.0001245086705202312, 'epoch': 6.79} 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0526, 'learning_rate': 0.00012433526011560692, 'epoch': 6.79} 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|██████████████████████████████████████████████████▉ | 1513/2230 [9:48:05<5:32:10, 27.80s/it] Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:00:33,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:00:33,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:00:33,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0653, 'learning_rate': 0.00012416184971098267, 'epoch': 6.8} [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:00:39,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0682, 'learning_rate': 0.00012398843930635836, 'epoch': 6.8} 68%|███████████████████████████████████████████████████ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████ | 1517/2230 [9:49:39<4:51:33, 24.53s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0612, 'learning_rate': 0.0001238150289017341, 'epoch': 6.81} [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0616, 'learning_rate': 0.0001236416184971098, 'epoch': 6.81} [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:01:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0609, 'learning_rate': 0.00012346820809248555, 'epoch': 6.82} [WARNING|modeling_utils.py:388] 2022-03-27 03:02:23,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:23,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:27,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0565, 'learning_rate': 0.00012329479768786127, 'epoch': 6.82} [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:35,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:51,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:51,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:02:55,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0499, 'learning_rate': 0.000123121387283237, 'epoch': 6.83} 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▏ | 1522/2230 [9:51:32<4:26:41, 22.60s/it]g-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0652, 'learning_rate': 0.0001229479768786127, 'epoch': 6.83} [WARNING|modeling_utils.py:388] 2022-03-27 03:03:18,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:33,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:33,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:33,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:39,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:47,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:47,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:03:51,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:03:59,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:02,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:02,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:05,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:05,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0348, 'learning_rate': 0.00012260115606936415, 'epoch': 6.84} [WARNING|modeling_utils.py:388] 2022-03-27 03:04:09,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:12,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:21,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:21,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:26,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:26,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0554, 'learning_rate': 0.00012242774566473987, 'epoch': 6.84} [WARNING|modeling_utils.py:388] 2022-03-27 03:04:30,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:30,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:30,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:36,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:38,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:38,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:38,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:44,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:44,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:47,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:47,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:51,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:04:51,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:54,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:57,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:04:57,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:01,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:03,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:03,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:05,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:05,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:05:09,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:05:09,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:13,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:15,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:17,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:19,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 02:38:28,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▍ | 1529/2230 [9:53:48<3:40:04, 18.84s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▍ | 1529/2230 [9:53:48<3:40:04, 18.84s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:23,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:25,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:27,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:29,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:31,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:33,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:35,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:21,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▍ | 1530/2230 [9:54:04<3:29:24, 17.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▍ | 1530/2230 [9:54:04<3:29:24, 17.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:39,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:41,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:42,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:44,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:46,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:50,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▍ | 1531/2230 [9:54:19<3:18:12, 17.01s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▍ | 1531/2230 [9:54:19<3:18:12, 17.01s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:53,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:56,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:58,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:05:59,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:01,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:02,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:05:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▌ | 1532/2230 [9:54:33<3:08:51, 16.23s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▌ | 1532/2230 [9:54:33<3:08:51, 16.23s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:08,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:09,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:11,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:14,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:15,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▌ | 1533/2230 [9:54:46<2:56:09, 15.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▌ | 1533/2230 [9:54:46<2:56:09, 15.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:06,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:20,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:21,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:24,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:26,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:28,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:28,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:19,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▌ | 1534/2230 [9:54:57<2:42:12, 13.98s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:32,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:34,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:36,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:38,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:38,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:30,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▋ | 1535/2230 [9:55:07<2:27:58, 12.77s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:42,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:44,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:46,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:46,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:40,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▋ | 1536/2230 [9:55:16<2:13:29, 11.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:50,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:53,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:55,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:55,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:48,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:57,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:06:58,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:01,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it] Setting `use_cache=False`...1] 2022-03-27 03:06:56,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▋ | 1538/2230 [9:55:31<1:49:01, 9.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:08,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:08,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:12,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:15,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:15,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:19,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:19,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:22,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:22,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:26,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:29,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:29,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:29,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:04,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▊ | 1539/2230 [9:56:00<2:56:45, 15.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▊ | 1539/2230 [9:56:00<2:56:45, 15.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:37,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:37,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:40,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:44,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:44,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:47,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:47,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:51,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:54,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:54,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:07:57,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▊ | 1540/2230 [9:56:28<3:40:22, 19.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▊ | 1540/2230 [9:56:28<3:40:22, 19.16s/it] Setting `use_cache=False`...1] 2022-03-27 03:07:33,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▊ | 1540/2230 [9:56:28<3:40:22, 19.16s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:05,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:05,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:08,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:08,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:11,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:11,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:15,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:18,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:18,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:22,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:22,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:25,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:25,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:01,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▊ | 1541/2230 [9:56:55<4:08:24, 21.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 69%|███████████████████████████████████████████████████▊ | 1541/2230 [9:56:55<4:08:24, 21.63s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:32,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:32,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:35,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:39,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:39,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:42,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:45,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:45,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0729, 'learning_rate': 0.00011965317919075144, 'epoch': 6.91} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0855, 'learning_rate': 0.00011947976878612715, 'epoch': 6.92} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0607, 'learning_rate': 0.00011930635838150289, 'epoch': 6.92} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0633, 'learning_rate': 0.0001191329479768786, 'epoch': 6.93} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0864, 'learning_rate': 0.00011895953757225433, 'epoch': 6.93} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0667, 'learning_rate': 0.00011878612716763005, 'epoch': 6.94} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0621, 'learning_rate': 0.00011861271676300578, 'epoch': 6.94} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0582, 'learning_rate': 0.00011843930635838149, 'epoch': 6.95} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0563, 'learning_rate': 0.00011826589595375722, 'epoch': 6.95} [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:08:49,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▍ | 1551/2230 [10:01:11<4:38:01, 24.57s/it] Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▍ | 1551/2230 [10:01:11<4:38:01, 24.57s/it] Setting `use_cache=False`...1] 2022-03-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0595, 'learning_rate': 0.00011809248554913293, 'epoch': 6.96} [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:12:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0622, 'learning_rate': 0.00011791907514450866, 'epoch': 6.96} 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0489, 'learning_rate': 0.00011774566473988439, 'epoch': 6.96} 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|███████████████████████████████████████████████████▌ | 1552/2230 [10:01:34<4:32:45, 24.14s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:40,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:40,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:44,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:13:52,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:03,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:03,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:03,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0699, 'learning_rate': 0.00011739884393063583, 'epoch': 6.97} [WARNING|modeling_utils.py:388] 2022-03-27 03:14:09,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:31,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:33,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:33,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:37,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:40,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:14:40,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:43,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:45,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:47,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:50,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:50,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:52,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:53,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:55,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:57,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:14:59,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:01,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:03,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:03,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:06,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:08,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:10,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:11,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:14,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:15,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:15,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:18,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:19,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:22,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:24,572 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:26,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:26,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:28,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:30,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:32,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:32,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:35,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:35,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:38,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:38,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:42,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:42,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:46,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:46,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:49,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:53,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:53,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:57,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:15:57,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:00,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:00,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.082, 'learning_rate': 0.00011618497109826587, 'epoch': 7.0} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:04,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:07,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:07,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:11,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:11,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:14,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:14,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0802, 'learning_rate': 0.0001160115606936416, 'epoch': 7.01} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0734, 'learning_rate': 0.00011583815028901733, 'epoch': 7.01} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.075, 'learning_rate': 0.00011566473988439306, 'epoch': 7.02} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.074, 'learning_rate': 0.00011549132947976877, 'epoch': 7.02} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0624, 'learning_rate': 0.0001153179190751445, 'epoch': 7.03} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0662, 'learning_rate': 0.00011514450867052021, 'epoch': 7.03} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0631, 'learning_rate': 0.00011497109826589594, 'epoch': 7.04} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0493, 'learning_rate': 0.00011479768786127166, 'epoch': 7.04} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0642, 'learning_rate': 0.00011462427745664738, 'epoch': 7.04} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0557, 'learning_rate': 0.0001144508670520231, 'epoch': 7.05} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0515, 'learning_rate': 0.00011427745664739884, 'epoch': 7.05} [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:16:19,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0615, 'learning_rate': 0.00011410404624277455, 'epoch': 7.06} 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▏ | 1574/2230 [10:09:57<4:49:08, 26.45s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0486, 'learning_rate': 0.00011393063583815028, 'epoch': 7.06} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0503, 'learning_rate': 0.00011375722543352599, 'epoch': 7.07} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0713, 'learning_rate': 0.00011358381502890172, 'epoch': 7.07} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0545, 'learning_rate': 0.00011341040462427744, 'epoch': 7.08} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0459, 'learning_rate': 0.00011323699421965318, 'epoch': 7.08} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0357, 'learning_rate': 0.00011306358381502888, 'epoch': 7.09} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0504, 'learning_rate': 0.00011289017341040462, 'epoch': 7.09} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0427, 'learning_rate': 0.00011271676300578033, 'epoch': 7.09} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0584, 'learning_rate': 0.00011254335260115606, 'epoch': 7.1} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0566, 'learning_rate': 0.00011236994219653178, 'epoch': 7.1} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0452, 'learning_rate': 0.0001121965317919075, 'epoch': 7.11} [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:21:50,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▋ | 1586/2230 [10:15:00<4:25:17, 24.72s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▋ | 1586/2230 [10:15:00<4:25:17, 24.72s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0471, 'learning_rate': 0.00011184971098265896, 'epoch': 7.12} [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0488, 'learning_rate': 0.00011167630057803466, 'epoch': 7.12} [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0331, 'learning_rate': 0.0001115028901734104, 'epoch': 7.13} [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0503, 'learning_rate': 0.00011132947976878612, 'epoch': 7.13} [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:26:37,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:28:26,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1591/2230 [10:16:59<4:13:09, 23.77s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1591/2230 [10:16:59<4:13:09, 23.77s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1591/2230 [10:16:59<4:13:09, 23.77s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:28:38,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0457, 'learning_rate': 0.00011098265895953756, 'epoch': 7.14} 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0453, 'learning_rate': 0.0001108092485549133, 'epoch': 7.14} 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.048, 'learning_rate': 0.000110635838150289, 'epoch': 7.15} 71%|████████████████████████████████████████████████████▊ | 1592/2230 [10:17:22<4:10:27, 23.55s/it] Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:29:47,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.043, 'learning_rate': 0.00011046242774566474, 'epoch': 7.15} [WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:30:06,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:16,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0318, 'learning_rate': 0.00011028901734104044, 'epoch': 7.16} 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|████████████████████████████████████████████████████▉ | 1596/2230 [10:18:52<3:57:22, 22.46s/it]g-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0444, 'learning_rate': 0.00011011560693641618, 'epoch': 7.16} [WARNING|modeling_utils.py:388] 2022-03-27 03:30:38,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:30:53,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0316, 'learning_rate': 0.0001099421965317919, 'epoch': 7.17} [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:05,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:13,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:13,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:13,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:19,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:19,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:19,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0392, 'learning_rate': 0.00010976878612716762, 'epoch': 7.17} [WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:26,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:35,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:35,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:40,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:40,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:44,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:44,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:44,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:48,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:50,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:50,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:54,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:31:54,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:58,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:31:58,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:02,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:04,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:04,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0407, 'learning_rate': 0.00010942196531791907, 'epoch': 7.18} [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:09,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:09,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:12,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:14,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:17,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:19,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:21,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:21,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:23,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:25,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:32:25,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:29,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:31,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:33,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:35,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:37,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:37,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:08:29,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▏ | 1603/2230 [10:21:06<3:12:30, 18.42s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:41,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:43,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:44,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:46,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:48,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:50,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▏ | 1604/2230 [10:21:21<3:01:24, 17.39s/it] Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▏ | 1604/2230 [10:21:21<3:01:24, 17.39s/it] Setting `use_cache=False`...1] 2022-03-27 03:32:39,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▏ | 1604/2230 [10:21:21<3:01:24, 17.39s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:57,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:32:59,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:01,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:02,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:04,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:32:54,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1605/2230 [10:21:35<2:50:15, 16.34s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1605/2230 [10:21:35<2:50:15, 16.34s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:09,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:11,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:12,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:15,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:17,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1606/2230 [10:21:47<2:37:52, 15.18s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1606/2230 [10:21:47<2:37:52, 15.18s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:08,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:21,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:24,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:25,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:28,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:29,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:20,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1607/2230 [10:21:59<2:26:56, 14.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1607/2230 [10:21:59<2:26:56, 14.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:33,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:35,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:37,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:40,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:32,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1608/2230 [10:22:09<2:13:38, 12.89s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▎ | 1608/2230 [10:22:09<2:13:38, 12.89s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:44,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:45,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:47,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1609/2230 [10:22:18<2:00:25, 11.64s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1609/2230 [10:22:18<2:00:25, 11.64s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:42,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:51,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:50,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:54,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:50,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:33:56,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:50,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1610/2230 [10:22:25<1:47:07, 10.37s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1610/2230 [10:22:25<1:47:07, 10.37s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:00,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:02,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it] Setting `use_cache=False`...1] 2022-03-27 03:33:58,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1611/2230 [10:22:32<1:34:49, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:09,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:12,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:12,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:12,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:16,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:20,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:20,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:23,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:23,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:27,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:31,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:31,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:05,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1612/2230 [10:23:01<2:37:04, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▍ | 1612/2230 [10:23:01<2:37:04, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0953, 'learning_rate': 0.00010751445086705201, 'epoch': 7.23} [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:38,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:38,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:41,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:41,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:45,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:50,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:50,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:53,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:57,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:34:57,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:00,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it] Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it] Setting `use_cache=False`...1] 2022-03-27 03:34:34,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▌ | 1613/2230 [10:23:31<3:20:57, 19.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:07,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:11,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:11,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:14,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:14,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:18,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:21,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:21,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:25,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:25,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:28,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:28,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▌ | 1614/2230 [10:23:59<3:47:08, 22.12s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:04,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|█████████████████████████████████████████████████████▌ | 1614/2230 [10:23:59<3:47:08, 22.12s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:35,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:35,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:39,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:42,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:42,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:46,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:46,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:49,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0735, 'learning_rate': 0.00010699421965317919, 'epoch': 7.24} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0702, 'learning_rate': 0.0001068208092485549, 'epoch': 7.25} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0634, 'learning_rate': 0.00010664739884393063, 'epoch': 7.25} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0637, 'learning_rate': 0.00010647398843930635, 'epoch': 7.26} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0591, 'learning_rate': 0.00010630057803468207, 'epoch': 7.26} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0628, 'learning_rate': 0.00010612716763005779, 'epoch': 7.26} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0501, 'learning_rate': 0.00010595375722543353, 'epoch': 7.27} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0643, 'learning_rate': 0.00010578034682080923, 'epoch': 7.27} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0555, 'learning_rate': 0.00010560693641618497, 'epoch': 7.28} [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:35:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0606, 'learning_rate': 0.00010543352601156068, 'epoch': 7.28} 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0447, 'learning_rate': 0.00010526011560693641, 'epoch': 7.29} 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0522, 'learning_rate': 0.00010508670520231213, 'epoch': 7.29} 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0526, 'learning_rate': 0.00010491329479768786, 'epoch': 7.3} 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.046, 'learning_rate': 0.00010473988439306357, 'epoch': 7.3} 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████▉ | 1624/2230 [10:28:29<4:26:16, 26.36s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0683, 'learning_rate': 0.0001045664739884393, 'epoch': 7.3} 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0516, 'learning_rate': 0.00010439306358381501, 'epoch': 7.31} 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0431, 'learning_rate': 0.00010421965317919075, 'epoch': 7.31} 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████ | 1629/2230 [10:30:39<4:18:13, 25.78s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0407, 'learning_rate': 0.00010404624277456647, 'epoch': 7.32} Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0441, 'learning_rate': 0.00010369942196531791, 'epoch': 7.33} 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0361, 'learning_rate': 0.00010352601156069364, 'epoch': 7.33} 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▏ | 1633/2230 [10:32:19<4:11:15, 25.25s/it] Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:44:52,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0417, 'learning_rate': 0.00010335260115606935, 'epoch': 7.34} [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:04,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0548, 'learning_rate': 0.00010317919075144509, 'epoch': 7.34} [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:45:26,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:45:43,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0421, 'learning_rate': 0.00010283236994219653, 'epoch': 7.35} 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|██████████████████████████████████████████████████████▎ | 1638/2230 [10:34:21<4:01:30, 24.48s/it] Setting `use_cache=False`...e computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:46:32,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:35:32,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0444, 'learning_rate': 0.00010265895953757225, 'epoch': 7.35} 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1640/2230 [10:35:08<3:55:35, 23.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.037, 'learning_rate': 0.00010248554913294798, 'epoch': 7.36} [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.046, 'learning_rate': 0.00010231213872832369, 'epoch': 7.36} 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▍ | 1642/2230 [10:35:54<3:49:51, 23.45s/it][WARNING|modeling_bart.py:1051] 2022-03-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0377, 'learning_rate': 0.00010213872832369942, 'epoch': 7.37} [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:47:48,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.043, 'learning_rate': 0.00010196531791907513, 'epoch': 7.37} [WARNING|modeling_utils.py:388] 2022-03-27 03:48:15,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:15,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:19,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0417, 'learning_rate': 0.00010179190751445086, 'epoch': 7.38} [WARNING|modeling_utils.py:388] 2022-03-27 03:48:23,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:39,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▌ | 1646/2230 [10:37:23<3:37:31, 22.35s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:48:57,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:10,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:10,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:10,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0432, 'learning_rate': 0.0001014450867052023, 'epoch': 7.39} [WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:16,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:28,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0434, 'learning_rate': 0.00010127167630057803, 'epoch': 7.39} [WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:36,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:44,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:44,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:44,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:50,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:49:50,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:49:55,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▋ | 1649/2230 [10:38:24<3:23:55, 21.06s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|██████████████████████████████████████████████████████▋ | 1649/2230 [10:38:24<3:23:55, 21.06s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0352, 'learning_rate': 0.00010109826589595376, 'epoch': 7.39} [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:01,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:01,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:11,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:13,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0402, 'learning_rate': 0.00010092485549132947, 'epoch': 7.4} [WARNING|modeling_utils.py:388] 2022-03-27 03:50:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:25,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:25,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:25,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:31,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:31,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:35,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:50:35,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0381, 'learning_rate': 0.0001007514450867052, 'epoch': 7.4} [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:39,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:41,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:43,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:46,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:48,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:50,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:52,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:52,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:54,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:56,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:50:58,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:51:00,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 03:51:00,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:04,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:06,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:08,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:08,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:10,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:12,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:14,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:16,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:18,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:19,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:21,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:23,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:23,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:25,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:27,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:30,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:32,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:34,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:35,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:35,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:37,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:41,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:42,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:44,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:48,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:50,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:50,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:53,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:55,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:56,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:51:59,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:00,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:03,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:03,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:04,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:07,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:09,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:10,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:10,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:15,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:17,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:19,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:19,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:20,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:23,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:25,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:27,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:27,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:29,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:31,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:33,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:34,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:34,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:37,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:37,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:41,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:41,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:44,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:44,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:48,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:48,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:51,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:55,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:55,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:59,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:52:59,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:02,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:02,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:06,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:06,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:09,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:09,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:13,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:13,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:17,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:17,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:21,596 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:21,596 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:25,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:25,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:28,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:32,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:32,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0789, 'learning_rate': 9.867052023121385e-05, 'epoch': 7.46} [WARNING|modeling_utils.py:388] 2022-03-27 03:53:35,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:35,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:39,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:39,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:42,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:46,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:46,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:49,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:49,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:53,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:56,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:53:56,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:00,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:00,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0637, 'learning_rate': 9.849710982658958e-05, 'epoch': 7.46} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:03,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:07,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:07,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:10,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:10,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:14,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0571, 'learning_rate': 9.83236994219653e-05, 'epoch': 7.47} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0686, 'learning_rate': 9.815028901734104e-05, 'epoch': 7.47} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0591, 'learning_rate': 9.797687861271675e-05, 'epoch': 7.48} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0504, 'learning_rate': 9.780346820809248e-05, 'epoch': 7.48} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0571, 'learning_rate': 9.763005780346819e-05, 'epoch': 7.48} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.056, 'learning_rate': 9.745664739884392e-05, 'epoch': 7.49} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0677, 'learning_rate': 9.728323699421964e-05, 'epoch': 7.49} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0481, 'learning_rate': 9.710982658959536e-05, 'epoch': 7.5} [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 03:54:17,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0565, 'learning_rate': 9.693641618497108e-05, 'epoch': 7.5} 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.052, 'learning_rate': 9.676300578034682e-05, 'epoch': 7.51} 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0489, 'learning_rate': 9.658959537572253e-05, 'epoch': 7.51} 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0433, 'learning_rate': 9.641618497109826e-05, 'epoch': 7.52} 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▌ | 1673/2230 [10:46:32<4:04:57, 26.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0491, 'learning_rate': 9.624277456647398e-05, 'epoch': 7.52} 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0445, 'learning_rate': 9.60693641618497e-05, 'epoch': 7.52} 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0593, 'learning_rate': 9.589595375722542e-05, 'epoch': 7.53} 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0491, 'learning_rate': 9.572254335260116e-05, 'epoch': 7.53} 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▋ | 1677/2230 [10:48:16<3:59:55, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0614, 'learning_rate': 9.554913294797686e-05, 'epoch': 7.54} 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0562, 'learning_rate': 9.53757225433526e-05, 'epoch': 7.54} 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0352, 'learning_rate': 9.52023121387283e-05, 'epoch': 7.55} 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0484, 'learning_rate': 9.502890173410404e-05, 'epoch': 7.55} 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0393, 'learning_rate': 9.485549132947976e-05, 'epoch': 7.56} 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|███████████████████████████████████████████████████████▊ | 1681/2230 [10:49:57<3:51:23, 25.29s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0397, 'learning_rate': 9.468208092485548e-05, 'epoch': 7.56} 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0423, 'learning_rate': 9.45086705202312e-05, 'epoch': 7.57} 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|███████████████████████████████████████████████████████▉ | 1686/2230 [10:52:00<3:42:20, 24.52s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0478, 'learning_rate': 9.433526011560693e-05, 'epoch': 7.57} [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:04:04,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0464, 'learning_rate': 9.416184971098264e-05, 'epoch': 7.57} 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0414, 'learning_rate': 9.398843930635838e-05, 'epoch': 7.58} 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.051, 'learning_rate': 9.38150289017341e-05, 'epoch': 7.58} 76%|████████████████████████████████████████████████████████ | 1689/2230 [10:53:11<3:36:58, 24.06s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0405, 'learning_rate': 9.364161849710982e-05, 'epoch': 7.59} [WARNING|modeling_utils.py:388] 2022-03-27 04:05:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0464, 'learning_rate': 9.346820809248554e-05, 'epoch': 7.59} [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:05:59,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:26,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:26,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:30,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0312, 'learning_rate': 9.329479768786127e-05, 'epoch': 7.6} 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████▏ | 1694/2230 [10:55:06<3:26:16, 23.09s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0429, 'learning_rate': 9.312138728323698e-05, 'epoch': 7.6} [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:06:50,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:21,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:21,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0376, 'learning_rate': 9.294797687861271e-05, 'epoch': 7.61} [WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:25,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0374, 'learning_rate': 9.277456647398842e-05, 'epoch': 7.61} [WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:07:37,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:07:49,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0412, 'learning_rate': 9.260115606936415e-05, 'epoch': 7.61} [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:06,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:06,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:10,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:18,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:18,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:22,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:22,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.029, 'learning_rate': 9.242774566473988e-05, 'epoch': 7.62} [WARNING|modeling_utils.py:388] 2022-03-27 04:08:22,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:34,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:34,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:38,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:44,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:44,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:49,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:49,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:52,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:55,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:08:55,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:08:59,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████▍ | 1701/2230 [10:57:28<2:56:01, 19.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|████████████████████████████████████████████████████████▍ | 1701/2230 [10:57:28<2:56:01, 19.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:09:03,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:09:05,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:09:07,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:09:09,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:09:11,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:09:13,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:09:13,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:17,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:17,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:19,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:21,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:23,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:25,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:27,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:29,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:31,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:33,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:33,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:35,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:37,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:39,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:41,314 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:43,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:48,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:48,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:50,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:52,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:53,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:55,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:57,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:09:59,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:02,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:02,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:04,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:05,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:10,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:11,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:13,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:13,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:16,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:18,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:20,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:21,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:24,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:26,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:26,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:28,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:30,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:32,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:34,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:36,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:36,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:39,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:41,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:42,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:44,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:44,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:46,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:49,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:51,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:52,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:52,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:54,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:57,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:59,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:10:59,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0469, 'learning_rate': 9.034682080924854e-05, 'epoch': 7.67} [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:02,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:02,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:06,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:06,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:10,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:13,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:13,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:21,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:21,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:24,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:28,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:28,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:28,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:31,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:31,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:35,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:35,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:39,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:42,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:42,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:42,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:47,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:54,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:54,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:54,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:11:57,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:01,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:01,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:04,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:04,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:08,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:08,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:11,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:15,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:15,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:18,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:18,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:22,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:25,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:25,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0592, 'learning_rate': 8.982658959537573e-05, 'epoch': 7.69} [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:29,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:29,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:32,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:35,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:35,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:39,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:39,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:42,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0604, 'learning_rate': 8.965317919075143e-05, 'epoch': 7.69} [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:12:46,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0669, 'learning_rate': 8.947976878612717e-05, 'epoch': 7.7} 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0608, 'learning_rate': 8.930635838150287e-05, 'epoch': 7.7} 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0554, 'learning_rate': 8.913294797687861e-05, 'epoch': 7.7} 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|████████████████████████████████████████████████████████▉ | 1716/2230 [11:01:48<3:31:14, 24.66s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0624, 'learning_rate': 8.895953757225433e-05, 'epoch': 7.71} 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1719/2230 [11:03:10<3:45:38, 26.49s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0649, 'learning_rate': 8.878612716763005e-05, 'epoch': 7.71} 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.057, 'learning_rate': 8.861271676300577e-05, 'epoch': 7.72} 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████ | 1720/2230 [11:03:37<3:45:33, 26.54s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0445, 'learning_rate': 8.84393063583815e-05, 'epoch': 7.72} 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0504, 'learning_rate': 8.826589595375721e-05, 'epoch': 7.73} 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0529, 'learning_rate': 8.809248554913295e-05, 'epoch': 7.73} 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0628, 'learning_rate': 8.791907514450865e-05, 'epoch': 7.74} 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.047, 'learning_rate': 8.774566473988439e-05, 'epoch': 7.74} 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0438, 'learning_rate': 8.757225433526011e-05, 'epoch': 7.74} 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▏ | 1722/2230 [11:04:30<3:44:06, 26.47s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0491, 'learning_rate': 8.739884393063584e-05, 'epoch': 7.75} 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0572, 'learning_rate': 8.722543352601155e-05, 'epoch': 7.75} 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0363, 'learning_rate': 8.705202312138728e-05, 'epoch': 7.76} 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|█████████████████████████████████████████████████████████▎ | 1728/2230 [11:07:06<3:36:33, 25.88s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0537, 'learning_rate': 8.687861271676299e-05, 'epoch': 7.76} 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0313, 'learning_rate': 8.670520231213873e-05, 'epoch': 7.77} 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0508, 'learning_rate': 8.653179190751445e-05, 'epoch': 7.77} 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.043, 'learning_rate': 8.635838150289017e-05, 'epoch': 7.78} 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0354, 'learning_rate': 8.618497109826589e-05, 'epoch': 7.78} 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▍ | 1731/2230 [11:08:21<3:30:39, 25.33s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0435, 'learning_rate': 8.601156069364162e-05, 'epoch': 7.78} 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0427, 'learning_rate': 8.583815028901733e-05, 'epoch': 7.79} 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0411, 'learning_rate': 8.566473988439306e-05, 'epoch': 7.79} 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0423, 'learning_rate': 8.549132947976878e-05, 'epoch': 7.8} 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▌ | 1736/2230 [11:10:24<3:22:18, 24.57s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0406, 'learning_rate': 8.53179190751445e-05, 'epoch': 7.8} [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:20,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▊ | 1741/2230 [11:12:23<3:13:06, 23.69s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▊ | 1741/2230 [11:12:23<3:13:06, 23.69s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0399, 'learning_rate': 8.514450867052023e-05, 'epoch': 7.81} [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0419, 'learning_rate': 8.497109826589596e-05, 'epoch': 7.81} [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:23:59,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:28,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:36,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0363, 'learning_rate': 8.479768786127167e-05, 'epoch': 7.82} [WARNING|modeling_utils.py:388] 2022-03-27 04:24:46,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:46,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:50,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:50,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0392, 'learning_rate': 8.46242774566474e-05, 'epoch': 7.82} [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:24:54,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:10,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0461, 'learning_rate': 8.445086705202311e-05, 'epoch': 7.83} [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:25:29,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▉ | 1746/2230 [11:14:14<2:59:44, 22.28s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:49,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:49,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:25:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:03,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:03,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0416, 'learning_rate': 8.410404624277456e-05, 'epoch': 7.83} 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|█████████████████████████████████████████████████████████▉ | 1747/2230 [11:14:35<2:56:11, 21.89s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:18,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:18,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:18,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:24,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:24,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|██████████████████████████████████████████████████████████ | 1748/2230 [11:14:56<2:51:58, 21.41s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|██████████████████████████████████████████████████████████ | 1748/2230 [11:14:56<2:51:58, 21.41s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.036, 'learning_rate': 8.393063583815028e-05, 'epoch': 7.84} 78%|██████████████████████████████████████████████████████████ | 1748/2230 [11:14:56<2:51:58, 21.41s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:34,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:34,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:34,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:40,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0414, 'learning_rate': 8.3757225433526e-05, 'epoch': 7.84} [WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:49,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:58,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:26:58,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:03,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0326, 'learning_rate': 8.358381502890174e-05, 'epoch': 7.85} [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:13,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:13,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:17,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:19,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:19,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:23,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:23,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:27,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:27,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:30,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:32,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:32,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:36,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:38,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:40,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:40,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:44,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:27:44,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0486, 'learning_rate': 8.323699421965317e-05, 'epoch': 7.86} [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:48,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:50,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:52,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:56,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:58,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:27:58,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:00,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:02,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:04,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:05,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:07,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:09,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:11,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:13,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:13,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:15,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:18,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:20,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:22,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:23,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:25,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:28,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:28,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:30,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:32,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:35,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:36,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:38,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:41,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:41,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:42,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:44,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:46,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:49,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:50,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:52,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:52,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:54,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:56,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:28:59,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:00,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:02,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:02,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:04,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:06,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:08,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:10,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:10,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:12,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:14,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:16,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:16,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:19,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:21,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:23,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:24,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:24,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1026, 'learning_rate': 8.167630057803468e-05, 'epoch': 7.9} [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:28,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:31,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:31,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:35,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:35,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:38,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:38,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:42,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:45,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:45,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:49,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:49,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:53,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:53,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.075, 'learning_rate': 8.150289017341039e-05, 'epoch': 7.9} [WARNING|modeling_bart.py:1051] 2022-03-27 04:29:56,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:00,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:00,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:03,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:03,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:07,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:07,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:11,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:11,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:15,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:18,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:18,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:21,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:21,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0526, 'learning_rate': 8.132947976878612e-05, 'epoch': 7.91} [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:25,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:28,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:28,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:32,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:32,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:35,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:39,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:39,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:42,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:42,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:45,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:49,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:49,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:49,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:52,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:52,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:56,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:30:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:02,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:02,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:06,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:09,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:09,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:12,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0562, 'learning_rate': 8.098265895953756e-05, 'epoch': 7.91} [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0543, 'learning_rate': 8.080924855491328e-05, 'epoch': 7.92} [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:31:16,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0638, 'learning_rate': 8.063583815028902e-05, 'epoch': 7.92} 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0364, 'learning_rate': 8.046242774566472e-05, 'epoch': 7.93} 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0621, 'learning_rate': 8.028901734104046e-05, 'epoch': 7.93} 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1767/2230 [11:20:37<3:10:37, 24.70s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0567, 'learning_rate': 8.011560693641617e-05, 'epoch': 7.94} 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▋ | 1770/2230 [11:21:54<3:14:14, 25.34s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0482, 'learning_rate': 7.99421965317919e-05, 'epoch': 7.94} 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1771/2230 [11:22:19<3:12:44, 25.20s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0518, 'learning_rate': 7.976878612716762e-05, 'epoch': 7.95} 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0447, 'learning_rate': 7.959537572254334e-05, 'epoch': 7.95} 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0483, 'learning_rate': 7.942196531791906e-05, 'epoch': 7.96} 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0337, 'learning_rate': 7.92485549132948e-05, 'epoch': 7.96} 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|██████████████████████████████████████████████████████████▊ | 1772/2230 [11:22:43<3:10:19, 24.93s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:35:52,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:35:52,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:35:56,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:35:56,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:00,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0472, 'learning_rate': 7.890173410404624e-05, 'epoch': 7.97} [WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:16,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:27,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0442, 'learning_rate': 7.872832369942196e-05, 'epoch': 7.97} [WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:37,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:36:45,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:36:45,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:49,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:49,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:49,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0333, 'learning_rate': 7.855491329479768e-05, 'epoch': 7.98} [WARNING|modeling_utils.py:388] 2022-03-27 04:36:55,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:58,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:36:58,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:02,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:02,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:05,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:07,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:10,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:10,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:12,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:14,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:16,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:18,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:20,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:21,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:23,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:25,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:25,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:27,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:29,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:31,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:34,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:36,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:37,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:37,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:40,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:42,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:44,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:46,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:48,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:48,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:50,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:52,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:54,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:54,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:55,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:55,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:37:59,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:03,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:03,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:06,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:06,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:10,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:10,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:13,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:13,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:17,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:21,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:21,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0672, 'learning_rate': 7.751445086705202e-05, 'epoch': 8.0} [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:28,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:28,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:31,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:35,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:35,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0566, 'learning_rate': 7.734104046242774e-05, 'epoch': 8.01} [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0577, 'learning_rate': 7.716763005780346e-05, 'epoch': 8.01} [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0516, 'learning_rate': 7.699421965317918e-05, 'epoch': 8.02} [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1426, 'learning_rate': 7.682080924855491e-05, 'epoch': 8.02} [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:38:38,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0846, 'learning_rate': 7.647398843930635e-05, 'epoch': 8.03} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0666, 'learning_rate': 7.630057803468207e-05, 'epoch': 8.04} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0505, 'learning_rate': 7.61271676300578e-05, 'epoch': 8.04} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0439, 'learning_rate': 7.595375722543352e-05, 'epoch': 8.04} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0355, 'learning_rate': 7.578034682080925e-05, 'epoch': 8.05} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0469, 'learning_rate': 7.560693641618496e-05, 'epoch': 8.05} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0565, 'learning_rate': 7.543352601156069e-05, 'epoch': 8.06} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0368, 'learning_rate': 7.52601156069364e-05, 'epoch': 8.06} 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|███████████████████████████████████████████████████████████▍ | 1790/2230 [11:29:12<3:10:49, 26.02s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0524, 'learning_rate': 7.508670520231213e-05, 'epoch': 8.07} 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0369, 'learning_rate': 7.491329479768785e-05, 'epoch': 8.07} 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0475, 'learning_rate': 7.473988439306357e-05, 'epoch': 8.08} 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.049, 'learning_rate': 7.45664739884393e-05, 'epoch': 8.08} 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▋ | 1799/2230 [11:33:10<3:06:31, 25.97s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0297, 'learning_rate': 7.421965317919074e-05, 'epoch': 8.09} 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.099, 'learning_rate': 7.404624277456646e-05, 'epoch': 8.09} 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0438, 'learning_rate': 7.387283236994219e-05, 'epoch': 8.1} 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1455, 'learning_rate': 7.369942196531791e-05, 'epoch': 8.1} 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0296, 'learning_rate': 7.352601156069363e-05, 'epoch': 8.11} 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0297, 'learning_rate': 7.335260115606935e-05, 'epoch': 8.11} 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|███████████████████████████████████████████████████████████▊ | 1803/2230 [11:34:52<3:02:22, 25.63s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0351, 'learning_rate': 7.317919075144507e-05, 'epoch': 8.12} [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1393, 'learning_rate': 7.30057803468208e-05, 'epoch': 8.12} [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1093, 'learning_rate': 7.30057803468208e-05, 'epoch': 8.13} [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.08, 'learning_rate': 7.30057803468208e-05, 'epoch': 8.13} [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0995, 'learning_rate': 7.283236994219653e-05, 'epoch': 8.13} [WARNING|modeling_utils.py:388] 2022-03-27 04:49:09,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0839, 'learning_rate': 7.265895953757225e-05, 'epoch': 8.14} [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:50:58,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:28,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:28,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2315, 'learning_rate': 7.248554913294797e-05, 'epoch': 8.14} [WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:33,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:42,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:42,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2594, 'learning_rate': 7.231213872832369e-05, 'epoch': 8.15} [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:51:47,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:07,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1639, 'learning_rate': 7.213872832369941e-05, 'epoch': 8.15} 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▎ | 1818/2230 [11:40:47<2:33:43, 22.39s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:40,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:40,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:40,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:44,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:52:52,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1424, 'learning_rate': 7.179190751445085e-05, 'epoch': 8.16} [WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:02,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:14,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:22,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:22,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3705, 'learning_rate': 7.161849710982659e-05, 'epoch': 8.17} [WARNING|modeling_utils.py:388] 2022-03-27 04:53:22,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:29,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:29,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:29,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:35,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:35,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:39,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:46,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:53:46,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:50,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:50,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:50,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:56,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:53:56,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:00,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▍ | 1823/2230 [11:42:30<2:18:40, 20.44s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▍ | 1823/2230 [11:42:30<2:18:40, 20.44s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:04,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:04,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:04,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:10,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:12,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:12,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:16,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:18,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▌ | 1824/2230 [11:42:48<2:14:16, 19.84s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|████████████████████████████████████████████████████████████▌ | 1824/2230 [11:42:48<2:14:16, 19.84s/it] Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:22,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:24,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:27,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:27,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:31,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:33,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:37,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:37,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 04:54:37,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.184, 'learning_rate': 7.092485549132947e-05, 'epoch': 8.18} [WARNING|modeling_utils.py:388] 2022-03-27 04:54:43,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:45,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:47,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:49,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:51,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:53,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:55,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:55,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:57,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:54:59,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:01,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:03,205 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:05,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:06,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:08,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:08,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:10,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:12,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:14,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:17,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:19,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:20,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:22,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:22,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:25,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:27,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:28,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:32,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:33,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:35,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:35,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:38,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:39,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:42,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:43,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:46,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:47,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:47,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:50,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:51,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:53,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:56,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:58,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:55:58,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:00,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:02,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:03,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:05,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:05,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:07,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:10,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:12,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:14,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:14,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:16,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:18,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:20,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:20,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1203, 'learning_rate': 6.936416184971097e-05, 'epoch': 8.22} [WARNING|modeling_utils.py:388] 2022-03-27 04:56:24,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:24,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:28,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:28,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:31,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:31,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:35,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:35,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:39,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:42,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:42,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:46,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:46,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:46,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:49,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:53,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:53,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:57,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:56:57,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:00,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:00,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:04,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:07,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:07,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:11,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:11,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:14,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:18,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:18,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1833, 'learning_rate': 6.901734104046242e-05, 'epoch': 8.23} [WARNING|modeling_utils.py:388] 2022-03-27 04:57:21,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:21,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:25,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:25,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:28,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:32,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:32,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:35,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:35,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:39,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:42,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:42,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:46,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:46,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1643, 'learning_rate': 6.884393063583815e-05, 'epoch': 8.24} [WARNING|modeling_utils.py:388] 2022-03-27 04:57:49,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:49,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:53,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:56,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:57:56,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:01,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:01,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1469, 'learning_rate': 6.867052023121387e-05, 'epoch': 8.24} [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0913, 'learning_rate': 6.849710982658959e-05, 'epoch': 8.25} [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0403, 'learning_rate': 6.832369942196531e-05, 'epoch': 8.25} [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.096, 'learning_rate': 6.815028901734103e-05, 'epoch': 8.26} [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 04:58:04,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████ | 1842/2230 [11:48:31<2:50:12, 26.32s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2121, 'learning_rate': 6.780346820809248e-05, 'epoch': 8.26} g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1506, 'learning_rate': 6.76300578034682e-05, 'epoch': 8.27} g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0472, 'learning_rate': 6.745664739884392e-05, 'epoch': 8.27} g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0873, 'learning_rate': 6.728323699421964e-05, 'epoch': 8.28} g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1552, 'learning_rate': 6.710982658959537e-05, 'epoch': 8.28} g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.047, 'learning_rate': 6.693641618497109e-05, 'epoch': 8.29} g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0891, 'learning_rate': 6.658959537572254e-05, 'epoch': 8.3} 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0929, 'learning_rate': 6.641618497109826e-05, 'epoch': 8.3} 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0883, 'learning_rate': 6.624277456647398e-05, 'epoch': 8.3} 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0795, 'learning_rate': 6.60693641618497e-05, 'epoch': 8.31} 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1435, 'learning_rate': 6.589595375722542e-05, 'epoch': 8.31} 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1741, 'learning_rate': 6.572254335260114e-05, 'epoch': 8.32} 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0961, 'learning_rate': 6.554913294797688e-05, 'epoch': 8.32} 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▎ | 1849/2230 [11:51:35<2:45:16, 26.03s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1966, 'learning_rate': 6.53757225433526e-05, 'epoch': 8.33} [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:27,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1273, 'learning_rate': 6.520231213872832e-05, 'epoch': 8.33} [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0822, 'learning_rate': 6.502890173410404e-05, 'epoch': 8.34} [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:06:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0902, 'learning_rate': 6.485549132947976e-05, 'epoch': 8.34} 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0709, 'learning_rate': 6.468208092485548e-05, 'epoch': 8.35} 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▋ | 1860/2230 [11:56:10<2:30:22, 24.38s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|█████████████████████████████████████████████████████████████▊ | 1862/2230 [11:56:57<2:26:48, 23.94s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0312, 'learning_rate': 6.433526011560694e-05, 'epoch': 8.35} [WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:08:48,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:09:00,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0409, 'learning_rate': 6.416184971098266e-05, 'epoch': 8.36} 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0308, 'learning_rate': 6.398843930635838e-05, 'epoch': 8.36} 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0323, 'learning_rate': 6.38150289017341e-05, 'epoch': 8.37} 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|█████████████████████████████████████████████████████████████▊ | 1864/2230 [11:57:44<2:24:37, 23.71s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0841, 'learning_rate': 6.364161849710982e-05, 'epoch': 8.37} [WARNING|modeling_utils.py:388] 2022-03-27 05:10:13,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:29,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:29,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:33,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.086, 'learning_rate': 6.346820809248554e-05, 'epoch': 8.38} [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0813, 'learning_rate': 6.329479768786126e-05, 'epoch': 8.38} [WARNING|modeling_utils.py:388] 2022-03-27 05:10:45,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:11:31,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:11:31,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:11:31,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:36,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:36,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:36,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:43,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:43,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:43,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:49,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:49,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0999, 'learning_rate': 6.294797687861272e-05, 'epoch': 8.39} [WARNING|modeling_utils.py:388] 2022-03-27 05:11:49,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:11:55,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:05,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:05,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████ | 1872/2230 [12:00:37<2:05:14, 20.99s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████ | 1872/2230 [12:00:37<2:05:14, 20.99s/it]g-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:11,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:11,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:16,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:16,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:20,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:20,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:20,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:26,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:26,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:26,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.038, 'learning_rate': 6.260115606936416e-05, 'epoch': 8.4} [WARNING|modeling_utils.py:388] 2022-03-27 05:12:32,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:34,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:34,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:38,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:38,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:42,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:44,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:46,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:12:46,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0383, 'learning_rate': 6.242774566473988e-05, 'epoch': 8.4} [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:50,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:52,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:55,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:57,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:12:59,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:01,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:03,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:03,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:03,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:07,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:09,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:11,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:13,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:15,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:17,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:19,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:19,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 03:46:41,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▎ | 1876/2230 [12:01:48<1:47:41, 18.25s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:23,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:25,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:26,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:28,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:30,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:34,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:21,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▎ | 1877/2230 [12:02:03<1:41:09, 17.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▎ | 1877/2230 [12:02:03<1:41:09, 17.19s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:37,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:39,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:41,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:42,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:46,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:47,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:47,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:35,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▎ | 1878/2230 [12:02:16<1:34:23, 16.09s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:51,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:54,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:55,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:57,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:59,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:13:59,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:13:49,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▎ | 1879/2230 [12:02:28<1:27:07, 14.89s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:04,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:05,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:08,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:10,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:01,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▍ | 1880/2230 [12:02:39<1:19:20, 13.60s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▍ | 1880/2230 [12:02:39<1:19:20, 13.60s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:14,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:16,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:18,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▍ | 1881/2230 [12:02:48<1:11:38, 12.32s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▍ | 1881/2230 [12:02:48<1:11:38, 12.32s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:11,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:22,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:24,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:26,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:28,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▍ | 1882/2230 [12:02:58<1:05:51, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|██████████████████████████████████████████████████████████████▍ | 1882/2230 [12:02:58<1:05:51, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:32,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:34,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:36,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:30,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:38,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:38,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:40,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:42,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|████████████████████████████████████████████████████████████████▏ | 1884/2230 [12:03:11<52:13, 9.06s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|████████████████████████████████████████████████████████████████▏ | 1884/2230 [12:03:11<52:13, 9.06s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:37,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|████████████████████████████████████████████████████████████████▏ | 1884/2230 [12:03:11<52:13, 9.06s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:48,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:48,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:52,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:52,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:56,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:56,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:14:59,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:03,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:03,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:07,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:07,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:10,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it] Setting `use_cache=False`...1] 2022-03-27 05:14:45,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1885/2230 [12:03:41<1:26:55, 15.12s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:17,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:21,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:21,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:24,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:24,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:28,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:31,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:31,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:35,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:35,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:38,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:38,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:14,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1886/2230 [12:04:09<1:49:11, 19.05s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1886/2230 [12:04:09<1:49:11, 19.05s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:45,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:45,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:49,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:49,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:53,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:56,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:15:56,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:00,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:00,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:03,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:03,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:07,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:07,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:15:42,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1887/2230 [12:04:37<2:04:24, 21.76s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▌ | 1887/2230 [12:04:37<2:04:24, 21.76s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:13,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:13,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:17,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:20,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:20,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:20,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:25,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:28,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:28,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0413, 'learning_rate': 5.9999999999999995e-05, 'epoch': 8.47} [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0408, 'learning_rate': 5.982658959537572e-05, 'epoch': 8.47} [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0403, 'learning_rate': 5.965317919075144e-05, 'epoch': 8.48} [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.042, 'learning_rate': 5.9479768786127164e-05, 'epoch': 8.48} [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0374, 'learning_rate': 5.930635838150289e-05, 'epoch': 8.48} [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0483, 'learning_rate': 5.913294797687861e-05, 'epoch': 8.49} [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:16:32,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0782, 'learning_rate': 5.895953757225433e-05, 'epoch': 8.49} 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.087, 'learning_rate': 5.878612716763006e-05, 'epoch': 8.5} 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▊ | 1894/2230 [12:07:48<2:30:00, 26.79s/it] Setting `use_cache=False`...1] 2022-03-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0401, 'learning_rate': 5.84393063583815e-05, 'epoch': 8.51} [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:20:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0318, 'learning_rate': 5.8265895953757215e-05, 'epoch': 8.51} 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0416, 'learning_rate': 5.8092485549132936e-05, 'epoch': 8.52} 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0305, 'learning_rate': 5.791907514450866e-05, 'epoch': 8.52} 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0404, 'learning_rate': 5.7745664739884384e-05, 'epoch': 8.52} 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0756, 'learning_rate': 5.7572254335260105e-05, 'epoch': 8.53} 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|██████████████████████████████████████████████████████████████▉ | 1898/2230 [12:09:33<2:24:53, 26.19s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0329, 'learning_rate': 5.739884393063583e-05, 'epoch': 8.53} 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0344, 'learning_rate': 5.722543352601155e-05, 'epoch': 8.54} 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0259, 'learning_rate': 5.705202312138727e-05, 'epoch': 8.54} 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0342, 'learning_rate': 5.6878612716762994e-05, 'epoch': 8.55} 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|███████████████████████████████████████████████████████████████▏ | 1903/2230 [12:11:40<2:18:47, 25.47s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0425, 'learning_rate': 5.670520231213872e-05, 'epoch': 8.55} 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0411, 'learning_rate': 5.653179190751444e-05, 'epoch': 8.56} 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0291, 'learning_rate': 5.635838150289016e-05, 'epoch': 8.56} 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▎ | 1907/2230 [12:13:20<2:15:11, 25.11s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0399, 'learning_rate': 5.618497109826589e-05, 'epoch': 8.57} 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0319, 'learning_rate': 5.601156069364161e-05, 'epoch': 8.57} 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1910/2230 [12:14:32<2:10:11, 24.41s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0314, 'learning_rate': 5.583815028901733e-05, 'epoch': 8.57} 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1912/2230 [12:15:19<2:07:05, 23.98s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0389, 'learning_rate': 5.566473988439306e-05, 'epoch': 8.58} 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▍ | 1913/2230 [12:15:44<2:07:11, 24.07s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.034, 'learning_rate': 5.549132947976878e-05, 'epoch': 8.58} [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0278, 'learning_rate': 5.53179190751445e-05, 'epoch': 8.59} [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0305, 'learning_rate': 5.514450867052022e-05, 'epoch': 8.59} [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0461, 'learning_rate': 5.497109826589595e-05, 'epoch': 8.6} [WARNING|modeling_utils.py:388] 2022-03-27 05:27:39,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:52,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:52,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:28:57,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▋ | 1918/2230 [12:17:37<1:57:47, 22.65s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▋ | 1918/2230 [12:17:37<1:57:47, 22.65s/it]g-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0264, 'learning_rate': 5.479768786127167e-05, 'epoch': 8.6} [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:13,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0278, 'learning_rate': 5.462427745664739e-05, 'epoch': 8.61} [WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:31,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0276, 'learning_rate': 5.445086705202312e-05, 'epoch': 8.61} [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:29:46,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:08,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0338, 'learning_rate': 5.427745664739884e-05, 'epoch': 8.61} [WARNING|modeling_utils.py:388] 2022-03-27 05:30:18,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:18,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:18,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:24,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:24,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:24,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:30,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:37,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:37,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:37,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:43,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:30:45,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:16:10,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▊ | 1923/2230 [12:19:20<1:45:37, 20.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▊ | 1923/2230 [12:19:20<1:45:37, 20.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████▊ | 1923/2230 [12:19:20<1:45:37, 20.64s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:30:59,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:30:59,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:03,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:05,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:11,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:11,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:15,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:17,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:17,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:21,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:23,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:25,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:28,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:28,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0265, 'learning_rate': 5.358381502890173e-05, 'epoch': 8.63} [WARNING|modeling_utils.py:388] 2022-03-27 05:31:32,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:34,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:31:34,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:38,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:40,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:42,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:44,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:46,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:46,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:48,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:50,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:55,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:57,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:59,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:31:59,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:01,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:03,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:05,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:08,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:10,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:12,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:14,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:14,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:15,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:19,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:20,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:22,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:23,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:26,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:28,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:28,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:30,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:32,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:34,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:37,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:38,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:38,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:39,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:42,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:45,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:46,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:48,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:48,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:50,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:52,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:54,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:57,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:59,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:32:59,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:01,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:02,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:05,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:05,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:07,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:09,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:11,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:12,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:12,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0374, 'learning_rate': 5.2023121387283234e-05, 'epoch': 8.67} [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:16,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:16,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:19,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:23,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:23,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:27,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:27,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:30,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:30,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:34,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:37,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:37,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:41,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:41,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0673, 'learning_rate': 5.1849710982658955e-05, 'epoch': 8.68} [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:45,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:45,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:48,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:52,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:52,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:55,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:55,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:33:59,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:02,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:06,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:06,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:10,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:10,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0524, 'learning_rate': 5.1676300578034675e-05, 'epoch': 8.68} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:13,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:17,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:17,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:20,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:20,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:24,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:27,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:27,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:30,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:34,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:34,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:34,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:37,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:41,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:41,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:44,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:44,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:48,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:51,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:51,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0507, 'learning_rate': 5.1329479768786124e-05, 'epoch': 8.69} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0441, 'learning_rate': 5.1156069364161844e-05, 'epoch': 8.7} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0386, 'learning_rate': 5.0982658959537565e-05, 'epoch': 8.7} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0389, 'learning_rate': 5.080924855491329e-05, 'epoch': 8.7} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0461, 'learning_rate': 5.063583815028901e-05, 'epoch': 8.71} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0384, 'learning_rate': 5.0462427745664734e-05, 'epoch': 8.71} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.034, 'learning_rate': 5.028901734104046e-05, 'epoch': 8.72} [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:34:56,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.029, 'learning_rate': 5.011560693641618e-05, 'epoch': 8.72} 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0438, 'learning_rate': 4.99421965317919e-05, 'epoch': 8.73} 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0402, 'learning_rate': 4.976878612716762e-05, 'epoch': 8.73} 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.034, 'learning_rate': 4.959537572254335e-05, 'epoch': 8.74} 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0376, 'learning_rate': 4.942196531791907e-05, 'epoch': 8.74} 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.03, 'learning_rate': 4.924855491329479e-05, 'epoch': 8.74} 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0375, 'learning_rate': 4.907514450867052e-05, 'epoch': 8.75} 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|████████████████████████████████████████████████████████████████▌ | 1945/2230 [12:26:44<2:06:22, 26.61s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.044, 'learning_rate': 4.890173410404624e-05, 'epoch': 8.75} 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0438, 'learning_rate': 4.872832369942196e-05, 'epoch': 8.76} 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0375, 'learning_rate': 4.855491329479768e-05, 'epoch': 8.76} 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0526, 'learning_rate': 4.838150289017341e-05, 'epoch': 8.77} 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0449, 'learning_rate': 4.820809248554913e-05, 'epoch': 8.77} 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0414, 'learning_rate': 4.803468208092485e-05, 'epoch': 8.78} 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████▊ | 1952/2230 [12:29:44<1:59:05, 25.70s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.031, 'learning_rate': 4.786127167630058e-05, 'epoch': 8.78} [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0402, 'learning_rate': 4.76878612716763e-05, 'epoch': 8.78} [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0268, 'learning_rate': 4.751445086705202e-05, 'epoch': 8.79} [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.036, 'learning_rate': 4.734104046242774e-05, 'epoch': 8.79} [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:43:44,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0352, 'learning_rate': 4.716763005780347e-05, 'epoch': 8.8} 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████ | 1962/2230 [12:33:49<1:46:57, 23.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0335, 'learning_rate': 4.699421965317919e-05, 'epoch': 8.8} 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▏ | 1963/2230 [12:34:13<1:47:12, 24.09s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:08,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0366, 'learning_rate': 4.6647398843930636e-05, 'epoch': 8.81} [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:31,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0244, 'learning_rate': 4.647398843930636e-05, 'epoch': 8.82} [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:46:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0418, 'learning_rate': 4.630057803468208e-05, 'epoch': 8.82} 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████▎ | 1967/2230 [12:35:44<1:40:32, 22.94s/it]g-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0453, 'learning_rate': 4.61271676300578e-05, 'epoch': 8.83} [WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:35,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:47,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0253, 'learning_rate': 4.5953757225433526e-05, 'epoch': 8.83} [WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:47:57,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:11,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0302, 'learning_rate': 4.5780346820809246e-05, 'epoch': 8.83} [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:24,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:24,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:24,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:30,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0309, 'learning_rate': 4.560693641618497e-05, 'epoch': 8.84} [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:38,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:52,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:48:52,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:48:56,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0245, 'learning_rate': 4.5433526011560694e-05, 'epoch': 8.84} [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:04,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:04,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:09,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:09,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:09,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:15,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:15,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:15,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:20,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:20,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:23,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:23,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:27,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:27,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:31,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:33,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:49:33,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:37,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▌ | 1974/2230 [12:38:07<1:24:57, 19.91s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▌ | 1974/2230 [12:38:07<1:24:57, 19.91s/it] Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0278, 'learning_rate': 4.5086705202312136e-05, 'epoch': 8.85} [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:43,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:45,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:47,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:49,921 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:52,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:54,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:56,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:56,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:49:56,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:00,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:02,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:04,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:06,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:08,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:10,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:12,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:30:53,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▌ | 1976/2230 [12:38:41<1:18:09, 18.46s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▌ | 1976/2230 [12:38:41<1:18:09, 18.46s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:16,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:18,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:20,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:22,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:23,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:25,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▌ | 1977/2230 [12:38:56<1:13:37, 17.46s/it] Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▌ | 1977/2230 [12:38:56<1:13:37, 17.46s/it] Setting `use_cache=False`...1] 2022-03-27 05:50:14,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:31,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:33,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:34,857 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:38,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:39,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:39,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▋ | 1978/2230 [12:39:10<1:08:50, 16.39s/it] Setting `use_cache=False`...1] 2022-03-27 05:50:29,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:45,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:46,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:48,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:49,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:52,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:54,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:54,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:43,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▋ | 1979/2230 [12:39:23<1:04:03, 15.31s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:50:59,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:00,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:03,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:04,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:04,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:50:56,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|███████████████████████████████████████████████████████████████████▍ | 1980/2230 [12:39:35<58:56, 14.15s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:08,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:11,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:12,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:15,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:15,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|███████████████████████████████████████████████████████████████████▌ | 1981/2230 [12:39:45<53:50, 12.97s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:18,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:21,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:23,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:25,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:25,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:17,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|███████████████████████████████████████████████████████████████████▌ | 1982/2230 [12:39:54<49:18, 11.93s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:29,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:31,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:33,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:33,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:27,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:35,613 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:37,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:39,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:39,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|███████████████████████████████████████████████████████████████████▌ | 1984/2230 [12:40:08<38:34, 9.41s/it] Setting `use_cache=False`...1] 2022-03-27 05:51:34,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|███████████████████████████████████████████████████████████████████▌ | 1984/2230 [12:40:08<38:34, 9.41s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:45,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:45,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:49,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:49,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:53,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:53,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:51:56,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:00,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:00,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:03,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:03,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:07,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:07,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▊ | 1985/2230 [12:40:37<1:01:54, 15.16s/it] Setting `use_cache=False`...1] 2022-03-27 05:51:42,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▊ | 1985/2230 [12:40:37<1:01:54, 15.16s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:14,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:14,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:17,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:17,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:21,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:24,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:28,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:28,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:31,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:34,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:34,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:34,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:10,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▉ | 1986/2230 [12:41:05<1:17:05, 18.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▉ | 1986/2230 [12:41:05<1:17:05, 18.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:41,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:45,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:45,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:48,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:48,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:51,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:55,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:55,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:58,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:52:58,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:02,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:52:38,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▉ | 1987/2230 [12:41:32<1:26:47, 21.43s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████▉ | 1987/2230 [12:41:32<1:26:47, 21.43s/it][WARNING|modeling_bart.py:1051] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.05, 'learning_rate': 4.283236994219653e-05, 'epoch': 8.91} [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:08,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:08,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:12,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:15,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:15,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:15,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:20,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0402, 'learning_rate': 4.265895953757225e-05, 'epoch': 8.91} [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0487, 'learning_rate': 4.248554913294798e-05, 'epoch': 8.92} [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:53:23,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0366, 'learning_rate': 4.23121387283237e-05, 'epoch': 8.92} 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0349, 'learning_rate': 4.213872832369942e-05, 'epoch': 8.93} 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0305, 'learning_rate': 4.196531791907514e-05, 'epoch': 8.93} 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████ | 1990/2230 [12:42:52<1:39:07, 24.78s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0443, 'learning_rate': 4.179190751445087e-05, 'epoch': 8.94} 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0418, 'learning_rate': 4.161849710982658e-05, 'epoch': 8.94} 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.03, 'learning_rate': 4.1445086705202304e-05, 'epoch': 8.95} 89%|██████████████████████████████████████████████████████████████████▏ | 1993/2230 [12:44:09<1:39:16, 25.13s/it] Setting `use_cache=False`...1] 2022-03-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:38,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0308, 'learning_rate': 4.1271676300578025e-05, 'epoch': 8.95} [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0293, 'learning_rate': 4.109826589595375e-05, 'epoch': 8.96} [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0419, 'learning_rate': 4.092485549132947e-05, 'epoch': 8.96} [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:56:49,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▎ | 1999/2230 [12:46:31<1:30:18, 23.46s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▎ | 1999/2230 [12:46:31<1:30:18, 23.46s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0258, 'learning_rate': 4.075144508670519e-05, 'epoch': 8.96} [WARNING|modeling_bart.py:1051] 2022-03-27 05:58:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:58:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 05:58:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 05:58:14,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0291, 'learning_rate': 4.057803468208092e-05, 'epoch': 8.97} [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/27/2022 06:07:57 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.3589690625667572, 'eval_wer': 0.09641015470051567, 'eval_runtime': 570.6282, 'eval_samples_per_second': 4.63, 'eval_steps_per_second': 0.58, 'epoch': 8.97} [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/27/2022 06:09:26 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log']. This may take a bit of time if the files are large. [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 05:58:26,605 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:09:57,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:05,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:05,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:09,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:09,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:13,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:13,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0345, 'learning_rate': 4.040462427745664e-05, 'epoch': 8.97} [WARNING|modeling_utils.py:388] 2022-03-27 06:10:17,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:17,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:17,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:24,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:32,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:32,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|████████████████████████████████████████████████████████████████▋ | 2002/2230 [12:59:02<10:30:50, 166.01s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|████████████████████████████████████████████████████████████████▋ | 2002/2230 [12:59:02<10:30:50, 166.01s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:38,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:40,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:40,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:40,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:46,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:48,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:48,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:52,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:10:52,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0374, 'learning_rate': 4.005780346820808e-05, 'epoch': 8.98} [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:56,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:10:59,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:01,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:03,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:05,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:06,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:08,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:08,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:10,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:12,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:14,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:15,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:19,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:20,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:20,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:22,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:25,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:26,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:29,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:30,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:32,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:32,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:35,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:37,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:39,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:41,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:41,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0245, 'learning_rate': 3.936416184971098e-05, 'epoch': 9.0} [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:45,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:45,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:49,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:49,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:52,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:52,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:56,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:11:56,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:00,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:04,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:04,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:08,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:08,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:11,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:11,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:11,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0298, 'learning_rate': 3.901734104046242e-05, 'epoch': 9.01} [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0394, 'learning_rate': 3.884393063583814e-05, 'epoch': 9.01} [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0377, 'learning_rate': 3.867052023121387e-05, 'epoch': 9.02} [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:12:15,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0317, 'learning_rate': 3.849710982658959e-05, 'epoch': 9.02} 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0363, 'learning_rate': 3.832369942196531e-05, 'epoch': 9.03} 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2012/2230 [13:02:32<1:47:49, 29.68s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0391, 'learning_rate': 3.815028901734104e-05, 'epoch': 9.03} 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2014/2230 [13:03:28<1:43:01, 28.62s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0324, 'learning_rate': 3.797687861271676e-05, 'epoch': 9.04} 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.042, 'learning_rate': 3.780346820809248e-05, 'epoch': 9.04} 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.029, 'learning_rate': 3.76300578034682e-05, 'epoch': 9.04} 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0427, 'learning_rate': 3.745664739884393e-05, 'epoch': 9.05} 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0366, 'learning_rate': 3.728323699421965e-05, 'epoch': 9.05} 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████▊ | 2015/2230 [13:03:55<1:40:48, 28.13s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0306, 'learning_rate': 3.710982658959537e-05, 'epoch': 9.06} 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0419, 'learning_rate': 3.6936416184971096e-05, 'epoch': 9.06} 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0289, 'learning_rate': 3.6763005780346816e-05, 'epoch': 9.07} 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████ | 2020/2230 [13:06:07<1:33:34, 26.74s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0274, 'learning_rate': 3.6416184971098265e-05, 'epoch': 9.08} 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2688, 'learning_rate': 3.6242774566473985e-05, 'epoch': 9.08} 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0241, 'learning_rate': 3.6069364161849706e-05, 'epoch': 9.09} 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0307, 'learning_rate': 3.5895953757225427e-05, 'epoch': 9.09} 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▏ | 2023/2230 [13:07:25<1:30:13, 26.15s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0317, 'learning_rate': 3.5722543352601154e-05, 'epoch': 9.09} 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2028/2230 [13:09:32<1:25:41, 25.45s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0328, 'learning_rate': 3.5549132947976875e-05, 'epoch': 9.1} [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:21:23,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0224, 'learning_rate': 3.520231213872832e-05, 'epoch': 9.11} 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▎ | 2030/2230 [13:10:21<1:23:21, 25.01s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0383, 'learning_rate': 3.5028901734104043e-05, 'epoch': 9.11} 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0319, 'learning_rate': 3.4855491329479764e-05, 'epoch': 9.12} 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▍ | 2032/2230 [13:11:11<1:22:02, 24.86s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:16,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0283, 'learning_rate': 3.4682080924855485e-05, 'epoch': 9.12} [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0184, 'learning_rate': 3.450867052023121e-05, 'epoch': 9.13} [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:23:30,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0273, 'learning_rate': 3.433526011560693e-05, 'epoch': 9.13} g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0288, 'learning_rate': 3.4161849710982654e-05, 'epoch': 9.13} [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:24:36,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0342, 'learning_rate': 3.398843930635838e-05, 'epoch': 9.14} 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2038/2230 [13:13:32<1:15:42, 23.66s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0355, 'learning_rate': 3.38150289017341e-05, 'epoch': 9.14} 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0316, 'learning_rate': 3.364161849710982e-05, 'epoch': 9.15} 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▋ | 2039/2230 [13:13:55<1:14:14, 23.32s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0263, 'learning_rate': 3.346820809248554e-05, 'epoch': 9.15} [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:00,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:18,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:18,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:18,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:25,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:25,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0373, 'learning_rate': 3.329479768786127e-05, 'epoch': 9.16} [WARNING|modeling_utils.py:388] 2022-03-27 06:26:29,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:37,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:47,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0224, 'learning_rate': 3.312138728323699e-05, 'epoch': 9.16} [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:26:57,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|███████████████████████████████████████████████████████████████████▊ | 2044/2230 [13:15:41<1:06:46, 21.54s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|███████████████████████████████████████████████████████████████████▊ | 2044/2230 [13:15:41<1:06:46, 21.54s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.029, 'learning_rate': 3.294797687861271e-05, 'epoch': 9.17} 92%|███████████████████████████████████████████████████████████████████▊ | 2044/2230 [13:15:41<1:06:46, 21.54s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:20,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:20,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:24,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:32,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:32,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:32,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:36,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:36,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:40,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:40,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:45,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:27:45,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:49,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|███████████████████████████████████████████████████████████████████▉ | 2046/2230 [13:16:20<1:02:35, 20.41s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:55,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:57,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:27:57,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:01,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:03,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:03,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:28:07,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:28:09,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:28:09,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0351, 'learning_rate': 3.242774566473988e-05, 'epoch': 9.18} [WARNING|modeling_utils.py:388] 2022-03-27 06:28:13,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:28:15,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:28:17,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:28:19,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:28:19,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:23,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:25,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:27,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:27,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:31,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:33,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:35,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:37,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:39,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:41,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:43,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:43,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:45,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:47,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:49,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:52,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:56,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:57,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:28:57,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:00,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:02,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:04,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:05,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:07,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:09,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:12,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:12,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:14,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:15,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:17,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:20,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:21,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:24,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:24,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:26,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:27,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:30,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:32,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:34,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:34,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:36,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:38,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:40,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:42,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:45,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:45,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:46,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:48,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:50,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:52,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:52,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:54,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:57,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:29:59,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:01,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:01,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:02,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:05,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:07,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:08,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:08,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0711, 'learning_rate': 3.069364161849711e-05, 'epoch': 9.22} [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:11,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:11,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:15,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:19,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:19,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:22,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:22,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:26,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:26,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:30,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:30,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:33,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0487, 'learning_rate': 3.052023121387283e-05, 'epoch': 9.23} [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:41,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:41,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:44,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:44,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:48,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:51,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:51,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:30:58,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:02,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:02,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:02,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:05,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:05,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:09,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:09,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:13,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:16,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:16,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:20,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:20,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:23,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:27,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:27,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:30,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:30,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:30,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:34,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:37,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:37,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:41,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:41,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:44,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:47,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:47,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:51,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:51,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0409, 'learning_rate': 2.9999999999999997e-05, 'epoch': 9.24} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.038, 'learning_rate': 2.982658959537572e-05, 'epoch': 9.25} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0301, 'learning_rate': 2.9653179190751446e-05, 'epoch': 9.25} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0407, 'learning_rate': 2.9479768786127166e-05, 'epoch': 9.26} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0388, 'learning_rate': 2.930635838150289e-05, 'epoch': 9.26} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0413, 'learning_rate': 2.9132947976878608e-05, 'epoch': 9.26} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0317, 'learning_rate': 2.895953757225433e-05, 'epoch': 9.27} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0397, 'learning_rate': 2.8786127167630052e-05, 'epoch': 9.27} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0456, 'learning_rate': 2.8612716763005776e-05, 'epoch': 9.28} [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:31:54,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0348, 'learning_rate': 2.8439306358381497e-05, 'epoch': 9.28} 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0273, 'learning_rate': 2.826589595375722e-05, 'epoch': 9.29} 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▋ | 2070/2230 [13:24:33<1:10:48, 26.55s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0381, 'learning_rate': 2.8092485549132945e-05, 'epoch': 9.29} 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0338, 'learning_rate': 2.7919075144508666e-05, 'epoch': 9.3} 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0308, 'learning_rate': 2.774566473988439e-05, 'epoch': 9.3} 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0325, 'learning_rate': 2.757225433526011e-05, 'epoch': 9.3} 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0207, 'learning_rate': 2.7398843930635835e-05, 'epoch': 9.31} 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0315, 'learning_rate': 2.722543352601156e-05, 'epoch': 9.31} 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0329, 'learning_rate': 2.705202312138728e-05, 'epoch': 9.32} 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▊ | 2072/2230 [13:25:25<1:08:54, 26.17s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0331, 'learning_rate': 2.6878612716763003e-05, 'epoch': 9.32} 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|████████████████████████████████████████████████████████████████████▉ | 2079/2230 [13:28:22<1:03:29, 25.23s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0251, 'learning_rate': 2.6705202312138724e-05, 'epoch': 9.33} 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2080/2230 [13:28:47<1:02:39, 25.06s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0311, 'learning_rate': 2.6531791907514448e-05, 'epoch': 9.33} 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2081/2230 [13:29:12<1:01:51, 24.91s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0378, 'learning_rate': 2.635838150289017e-05, 'epoch': 9.34} 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0282, 'learning_rate': 2.6184971098265893e-05, 'epoch': 9.34} 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|█████████████████████████████████████████████████████████████████████ | 2082/2230 [13:29:37<1:01:35, 24.97s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0314, 'learning_rate': 2.6011560693641617e-05, 'epoch': 9.35} [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:41:52,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0311, 'learning_rate': 2.5838150289017338e-05, 'epoch': 9.35} [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.034, 'learning_rate': 2.5664739884393062e-05, 'epoch': 9.35} [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0324, 'learning_rate': 2.5491329479768782e-05, 'epoch': 9.36} [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0358, 'learning_rate': 2.5317919075144507e-05, 'epoch': 9.36} [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0237, 'learning_rate': 2.514450867052023e-05, 'epoch': 9.37} [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0342, 'learning_rate': 2.497109826589595e-05, 'epoch': 9.37} [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:42:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:44:25,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0208, 'learning_rate': 2.4797687861271675e-05, 'epoch': 9.38} 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▎ | 2091/2230 [13:33:05<52:30, 22.67s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:44:58,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:44:58,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:44:58,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:02,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0341, 'learning_rate': 2.445086705202312e-05, 'epoch': 9.39} [WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:18,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0236, 'learning_rate': 2.427745664739884e-05, 'epoch': 9.39} [WARNING|modeling_utils.py:388] 2022-03-27 06:45:28,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:47,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:55,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:45:55,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:45:59,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▍ | 2095/2230 [13:34:29<47:50, 21.26s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|███████████████████████████████████████████████████████████████████████▍ | 2095/2230 [13:34:29<47:50, 21.26s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0257, 'learning_rate': 2.4104046242774565e-05, 'epoch': 9.39} [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:05,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:05,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:09,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:19,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:19,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:21,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:21,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:21,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:27,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:29,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:29,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:34,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:34,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:37,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:37,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0304, 'learning_rate': 2.375722543352601e-05, 'epoch': 9.4} [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:42,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:44,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:46,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:48,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:51,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:51,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:54,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:54,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:46:54,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:46:58,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:00,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:02,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:04,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:06,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:09,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:11,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:13,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:13,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:15,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:17,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:19,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:20,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:22,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:24,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:26,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:28,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:28,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:31,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:33,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:34,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:36,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:38,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:41,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:41,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:43,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:45,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:46,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:50,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:51,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:53,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:56,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:56,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:57,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:47:59,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:02,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:03,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:06,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:07,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:07,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:10,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:11,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:14,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:15,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:17,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:17,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:20,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:24,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:26,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:26,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:28,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:30,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:32,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:34,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:34,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:36,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:38,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:40,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:41,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:41,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:45,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:45,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:48,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:48,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:52,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:52,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:55,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:55,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:48:59,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:03,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:03,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:06,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:06,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:10,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:10,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0482, 'learning_rate': 2.184971098265896e-05, 'epoch': 9.45} [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:13,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:17,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:17,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:21,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:21,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:24,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:28,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:28,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:31,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:31,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:35,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:35,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:38,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:38,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:42,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:42,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:45,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:45,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:49,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:52,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:52,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:56,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:56,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:49:59,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:03,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:03,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:06,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:06,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0355, 'learning_rate': 2.1502890173410405e-05, 'epoch': 9.46} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:10,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:10,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:13,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:17,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:17,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0358, 'learning_rate': 2.1329479768786126e-05, 'epoch': 9.47} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0419, 'learning_rate': 2.115606936416185e-05, 'epoch': 9.47} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0411, 'learning_rate': 2.098265895953757e-05, 'epoch': 9.48} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0319, 'learning_rate': 2.080924855491329e-05, 'epoch': 9.48} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0288, 'learning_rate': 2.0635838150289012e-05, 'epoch': 9.48} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0269, 'learning_rate': 2.0462427745664736e-05, 'epoch': 9.49} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0355, 'learning_rate': 2.028901734104046e-05, 'epoch': 9.49} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.032, 'learning_rate': 2.011560693641618e-05, 'epoch': 9.5} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0251, 'learning_rate': 1.9942196531791905e-05, 'epoch': 9.5} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0295, 'learning_rate': 1.9768786127167626e-05, 'epoch': 9.51} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0405, 'learning_rate': 1.959537572254335e-05, 'epoch': 9.51} [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:50:20,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0289, 'learning_rate': 1.942196531791907e-05, 'epoch': 9.52} 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0337, 'learning_rate': 1.9248554913294795e-05, 'epoch': 9.52} 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▎ | 2122/2230 [13:43:57<46:58, 26.10s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.035, 'learning_rate': 1.907514450867052e-05, 'epoch': 9.52} 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.029, 'learning_rate': 1.890173410404624e-05, 'epoch': 9.53} 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2124/2230 [13:44:48<45:35, 25.81s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.03, 'learning_rate': 1.8728323699421963e-05, 'epoch': 9.53} 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0287, 'learning_rate': 1.8554913294797684e-05, 'epoch': 9.54} 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0354, 'learning_rate': 1.8381502890173408e-05, 'epoch': 9.54} 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0262, 'learning_rate': 1.8208092485549132e-05, 'epoch': 9.55} 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0225, 'learning_rate': 1.8034682080924853e-05, 'epoch': 9.55} 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0288, 'learning_rate': 1.7861271676300577e-05, 'epoch': 9.56} 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▍ | 2126/2230 [13:45:39<44:38, 25.75s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0241, 'learning_rate': 1.7687861271676298e-05, 'epoch': 9.56} [WARNING|modeling_utils.py:388] 2022-03-27 06:59:26,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0277, 'learning_rate': 1.7514450867052022e-05, 'epoch': 9.57} [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 06:59:45,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0298, 'learning_rate': 1.7341040462427742e-05, 'epoch': 9.57} 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▋ | 2134/2230 [13:48:55<38:54, 24.32s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0278, 'learning_rate': 1.7167630057803466e-05, 'epoch': 9.57} [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:00:46,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0253, 'learning_rate': 1.699421965317919e-05, 'epoch': 9.58} 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|████████████████████████████████████████████████████████████████████████▊ | 2136/2230 [13:49:42<37:27, 23.91s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0331, 'learning_rate': 1.682080924855491e-05, 'epoch': 9.58} [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:23,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0272, 'learning_rate': 1.6647398843930635e-05, 'epoch': 9.59} [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:01:48,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0237, 'learning_rate': 1.6473988439306356e-05, 'epoch': 9.59} [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:23,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.027, 'learning_rate': 1.630057803468208e-05, 'epoch': 9.6} [WARNING|modeling_utils.py:388] 2022-03-27 07:02:49,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:49,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:02:53,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:08,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:08,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.026, 'learning_rate': 1.61271676300578e-05, 'epoch': 9.6} [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:12,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:12,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:16,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:24,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:24,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:03:24,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0281, 'learning_rate': 1.5953757225433525e-05, 'epoch': 9.61} 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2142/2230 [13:51:58<32:47, 22.36s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:03:52,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:03:52,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:03:56,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:03:56,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:03:56,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:03,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2144/2230 [13:52:41<31:22, 21.89s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2144/2230 [13:52:41<31:22, 21.89s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0188, 'learning_rate': 1.560693641618497e-05, 'epoch': 9.61} 96%|█████████████████████████████████████████████████████████████████████████ | 2144/2230 [13:52:41<31:22, 21.89s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:19,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:19,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:19,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:25,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:25,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:25,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:31,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2145/2230 [13:53:01<30:18, 21.39s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████ | 2145/2230 [13:53:01<30:18, 21.39s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:35,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:35,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:40,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:40,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:40,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:46,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:46,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:04:50,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:56,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:04:56,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:00,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:02,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:02,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:02,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:08,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:14,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:14,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:18,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:05:18,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:22,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:24,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:26,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:31,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:33,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:35,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:37,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:39,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:41,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:43,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:45,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:45,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:47,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:49,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:53,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:55,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:57,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:05:59,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:01,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:01,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:03,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:05,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:07,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:09,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:11,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:14,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:16,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:16,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:18,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:19,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:21,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:22,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:26,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:27,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:32,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:33,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:35,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:37,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:39,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:39,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:42,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:43,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:47,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:48,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:50,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:50,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:53,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:54,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:56,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:06:58,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:00,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:00,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:02,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:04,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:06,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:08,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:08,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:10,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:12,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:14,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:14,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.056, 'learning_rate': 1.3352601156069362e-05, 'epoch': 9.67} [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:21,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:25,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:25,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:28,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:28,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:32,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:32,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:36,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:39,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:39,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:43,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:43,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:43,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:47,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:47,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:50,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:54,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:54,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:57,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:07:57,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:01,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:01,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:04,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:08,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:08,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:11,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:11,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0395, 'learning_rate': 1.3005780346820809e-05, 'epoch': 9.68} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:15,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:18,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:18,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:22,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:22,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:25,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:25,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:29,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:32,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:32,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:36,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:36,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:36,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:39,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:42,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:42,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:46,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:46,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:49,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0302, 'learning_rate': 1.2658959537572253e-05, 'epoch': 9.69} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.037, 'learning_rate': 1.2485549132947976e-05, 'epoch': 9.7} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0276, 'learning_rate': 1.2312138728323698e-05, 'epoch': 9.7} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0316, 'learning_rate': 1.213872832369942e-05, 'epoch': 9.7} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0344, 'learning_rate': 1.1965317919075144e-05, 'epoch': 9.71} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0365, 'learning_rate': 1.1791907514450867e-05, 'epoch': 9.71} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0248, 'learning_rate': 1.161849710982659e-05, 'epoch': 9.72} [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:08:53,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.027, 'learning_rate': 1.1445086705202312e-05, 'epoch': 9.72} 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0345, 'learning_rate': 1.1271676300578034e-05, 'epoch': 9.73} 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2168/2230 [14:00:45<27:20, 26.47s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0264, 'learning_rate': 1.1098265895953756e-05, 'epoch': 9.73} 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0279, 'learning_rate': 1.092485549132948e-05, 'epoch': 9.74} 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0295, 'learning_rate': 1.0751445086705203e-05, 'epoch': 9.74} 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.034, 'learning_rate': 1.0578034682080925e-05, 'epoch': 9.74} 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|█████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [14:01:38<26:32, 26.54s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0232, 'learning_rate': 1.0404624277456646e-05, 'epoch': 9.75} 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.028, 'learning_rate': 1.0231213872832368e-05, 'epoch': 9.75} 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████ | 2174/2230 [14:03:21<24:04, 25.80s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0351, 'learning_rate': 1.005780346820809e-05, 'epoch': 9.76} 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0379, 'learning_rate': 9.884393063583813e-06, 'epoch': 9.76} 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0293, 'learning_rate': 9.710982658959535e-06, 'epoch': 9.77} 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0337, 'learning_rate': 9.53757225433526e-06, 'epoch': 9.77} 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [14:04:12<23:05, 25.66s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0362, 'learning_rate': 9.364161849710982e-06, 'epoch': 9.78} 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [14:05:50<20:43, 24.87s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0274, 'learning_rate': 9.190751445086704e-06, 'epoch': 9.78} 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [14:06:15<20:11, 24.73s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0325, 'learning_rate': 9.017341040462426e-06, 'epoch': 9.78} 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0346, 'learning_rate': 8.843930635838149e-06, 'epoch': 9.79} 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [14:06:40<19:48, 24.76s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0263, 'learning_rate': 8.670520231213871e-06, 'epoch': 9.79} Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.035, 'learning_rate': 8.497109826589595e-06, 'epoch': 9.8} [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:19:12,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0327, 'learning_rate': 8.323699421965318e-06, 'epoch': 9.8} 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▌ | 2186/2230 [14:08:14<17:28, 23.84s/it] Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0339, 'learning_rate': 8.15028901734104e-06, 'epoch': 9.81} Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0233, 'learning_rate': 7.976878612716762e-06, 'epoch': 9.81} Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.028, 'learning_rate': 7.803468208092485e-06, 'epoch': 9.82} [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0294, 'learning_rate': 7.630057803468207e-06, 'epoch': 9.82} [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:20:44,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:26,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:26,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:30,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0224, 'learning_rate': 7.45664739884393e-06, 'epoch': 9.83} [WARNING|modeling_bart.py:1051] 2022-03-27 07:21:44,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:21:44,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:21:44,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:21:50,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0323, 'learning_rate': 7.283236994219652e-06, 'epoch': 9.83} [WARNING|modeling_utils.py:388] 2022-03-27 07:22:04,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:04,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:08,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:16,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:16,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:22:21,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:22:21,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:22:21,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0337, 'learning_rate': 7.109826589595374e-06, 'epoch': 9.83} [WARNING|modeling_utils.py:388] 2022-03-27 07:22:26,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:26,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:26,887 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:33,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|██████████████████████████████████████████████████████████████████████████▊ | 2194/2230 [14:11:11<13:00, 21.69s/it]g-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:45,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:22:54,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:22:54,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:58,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:22:58,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:01,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:01,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.013, 'learning_rate': 6.76300578034682e-06, 'epoch': 9.84} [WARNING|modeling_utils.py:388] 2022-03-27 07:23:01,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:14,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:14,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:23:18,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:23:18,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:22,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:22,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0371, 'learning_rate': 6.589595375722542e-06, 'epoch': 9.85} [WARNING|modeling_utils.py:388] 2022-03-27 07:23:22,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:28,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:28,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:23:32,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:23:34,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:23:34,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:38,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:41,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:41,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0245, 'learning_rate': 6.4161849710982654e-06, 'epoch': 9.85} [WARNING|modeling_utils.py:388] 2022-03-27 07:23:41,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:46,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:48,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:51,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:53,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:55,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:57,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:23:57,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0334, 'learning_rate': 6.242774566473988e-06, 'epoch': 9.86} [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:01,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:03,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:05,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:07,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:09,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:11,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:13,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:13,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 05:53:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|██████████████████████████████████████████████████████████████████████████▉ | 2199/2230 [14:12:42<09:25, 18.24s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:16,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:18,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:20,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:22,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:24,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:28,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:28,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:28,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:15,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|██████████████████████████████████████████████████████████████████████████▉ | 2200/2230 [14:12:58<08:44, 17.48s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:32,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:34,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:39,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:41,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████ | 2201/2230 [14:13:11<07:53, 16.34s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████ | 2201/2230 [14:13:11<07:53, 16.34s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:30,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:45,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:47,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:49,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:52,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:53,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████ | 2202/2230 [14:13:23<07:02, 15.09s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████ | 2202/2230 [14:13:23<07:02, 15.09s/it] Setting `use_cache=False`...1] 2022-03-27 07:24:44,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:57,857 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:24:59,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:01,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:03,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:05,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:05,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:24:56,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████ | 2203/2230 [14:13:34<06:12, 13.81s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:09,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:12,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:14,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████ | 2204/2230 [14:13:44<05:25, 12.52s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████ | 2204/2230 [14:13:44<05:25, 12.52s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:07,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:17,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:19,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:21,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▏| 2205/2230 [14:13:52<04:41, 11.28s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▏| 2205/2230 [14:13:52<04:41, 11.28s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:16,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:25,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:25,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:28,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:25,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:30,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:25,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▏| 2206/2230 [14:14:00<04:02, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▏| 2206/2230 [14:14:00<04:02, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:34,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:36,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:38,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:32,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▏| 2207/2230 [14:14:07<03:31, 9.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▏| 2207/2230 [14:14:07<03:31, 9.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0398, 'learning_rate': 4.682080924855491e-06, 'epoch': 9.9} [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:44,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:44,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:48,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:48,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:51,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:51,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:55,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:58,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:25:58,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:02,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:02,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:05,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2208/2230 [14:14:36<05:34, 15.21s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2208/2230 [14:14:36<05:34, 15.21s/it] Setting `use_cache=False`...1] 2022-03-27 07:25:40,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2208/2230 [14:14:36<05:34, 15.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:13,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:13,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:16,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:16,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:19,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:23,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:23,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:26,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:26,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:30,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:33,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:26:09,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2209/2230 [14:15:03<06:37, 18.93s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2209/2230 [14:15:03<06:37, 18.93s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:40,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:44,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:44,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:47,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:47,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:50,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:54,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:54,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:57,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:26:57,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:01,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:01,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:26:37,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2210/2230 [14:15:31<07:09, 21.47s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2210/2230 [14:15:31<07:09, 21.47s/it][WARNING|modeling_bart.py:1051] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:07,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:07,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:11,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:14,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:14,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:17,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:17,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:21,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:24,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:24,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:28,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:27:28,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.035, 'learning_rate': 3.8150289017341036e-06, 'epoch': 9.92} 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0309, 'learning_rate': 3.641618497109826e-06, 'epoch': 9.92} 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0347, 'learning_rate': 3.4682080924855487e-06, 'epoch': 9.93} 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0296, 'learning_rate': 3.294797687861271e-06, 'epoch': 9.93} 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▎| 2211/2230 [14:15:58<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2216/2230 [14:18:08<05:53, 25.24s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0381, 'learning_rate': 2.9479768786127167e-06, 'epoch': 9.94} Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0363, 'learning_rate': 2.774566473988439e-06, 'epoch': 9.95} 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|███████████████████████████████████████████████████████████████████████████▌| 2218/2230 [14:18:57<04:59, 24.95s/it] Setting `use_cache=False`...1] 2022-03-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:30:40,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0239, 'learning_rate': 2.6011560693641614e-06, 'epoch': 9.95} 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|███████████████████████████████████████████████████████████████████████████▋| 2219/2230 [14:19:22<04:34, 25.00s/it]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0394, 'learning_rate': 2.427745664739884e-06, 'epoch': 9.96} [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0335, 'learning_rate': 2.2543352601156066e-06, 'epoch': 9.96} [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0213, 'learning_rate': 2.0809248554913294e-06, 'epoch': 9.96} [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:31:13,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:14,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:26,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:26,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:30,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:30,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:30,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.034, 'learning_rate': 1.7341040462427744e-06, 'epoch': 9.97} [WARNING|modeling_utils.py:388] 2022-03-27 07:32:37,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:32:53,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:32:53,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:57,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:32:57,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:01,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0296, 'learning_rate': 1.560693641618497e-06, 'epoch': 9.98} [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:11,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:14,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:16,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 07:33:16,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:19,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:21,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:23,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:23,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:25,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:29,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:31,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:33,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:35,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:36,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:36,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:40,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:41,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:43,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:46,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:47,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:50,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:50,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:51,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:54,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:55,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:57,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:59,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:33:59,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:34:02,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:34:04,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:34:06,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 07:34:06,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0134, 'learning_rate': 6.936416184971098e-07, 'epoch': 10.0} [INFO|configuration_utils.py:438] 2022-03-27 07:34:06,768 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|configuration_utils.py:438] 2022-03-27 07:34:18,633 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|configuration_utils.py:438] 2022-03-27 07:34:18,633 >> Configuration saved in ./config.jsons of the input, floating-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 2%|▏ | 192k/11.2M [00:01<01:14, 155kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 8%|▌ | 864k/11.2M [00:03<00:37, 291kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 14%|▉ | 1.53M/11.2M [00:05<00:31, 323kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 17%|█▏ | 1.94M/11.2M [00:07<00:37, 260kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 22%|█▌ | 2.41M/11.2M [00:09<00:36, 250kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 25%|█▋ | 2.78M/11.2M [00:11<00:39, 223kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 29%|█▉ | 3.19M/11.2M [00:13<00:38, 215kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 29%|█▉ | 3.19M/11.2M [00:13<00:38, 215kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 29%|█▉ | 3.19M/11.2M [00:13<00:38, 215kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 29%|█▉ | 3.19M/11.2M [00:13<00:38, 215kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 29%|█▉ | 3.19M/11.2M [00:13<00:38, 215kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/logs/debug-internal.log: 29%|█▉ | 3.19M/11.2M [00:13<00:38, 215kB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 47%|██████ | 204M/434M [00:25<00:14, 17.2MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 55%|███████▏ | 238M/434M [00:27<00:11, 17.4MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 60%|███████▊ | 260M/434M [00:29<00:13, 13.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 64%|████████▎ | 280M/434M [00:31<00:12, 12.7MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 72%|█████████▎ | 313M/434M [00:33<00:08, 15.1MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 80%|██████████▎ | 345M/434M [00:35<00:05, 16.0MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 87%|███████████▎ | 378M/434M [00:37<00:03, 16.4MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 95%|████████████▎| 411M/434M [00:39<00:01, 16.9MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 98%|████████████▊| 427M/434M [00:40<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:58<00:00, 16.8MB/s]g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/27/2022 07:38:16 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file runs/Mar26_17-11-01_sanchit--v100/events.out.tfevents.1648314690.sanchit--v100.2600125.0: 100%|█| 352k/352k g-point operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|modelcard.py:460] 2022-03-27 07:38:19,288 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/27/2022 07:38:51 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb: 100%|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... ***** train metrics ***** epoch = 10.0 train_loss = 1.2905 train_runtime = 14:22:36.40 train_samples = 28538 train_samples_per_second = 5.514 train_steps_per_second = 0.043 [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/27/2022 07:50:57 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow ***** eval metrics ***** epoch = 10.0 eval_loss = 0.3578 eval_runtime = 0:12:02.44 eval_samples = 2642 eval_samples_per_second = 3.657 eval_steps_per_second = 0.458 eval_wer = 0.0932 [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-27 07:38:54,563 >> Num examples = 2642|█████████████| 434M/434M [00:25<00:00, 18.1MB/s]ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/27/2022 07:51:54 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file runs/Mar26_17-11-01_sanchit--v100/events.out.tfevents.1648367457.sanchit--v100.2600125.2: 100%|█| 358/358 [0ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... info = model_info(self.finetuned_from)formers/src/transformers/modelcard.py", line 611, in from_trainer, in ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... info = model_info(self.finetuned_from)formers/src/transformers/modelcard.py", line 611, in from_trainer, in ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... info = model_info(self.finetuned_from)formers/src/transformers/modelcard.py", line 611, in from_trainer, in ields:t operations will not be computed-27 07:27:04,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...